Neurális hálózatok lokális linearitása és a Double Descent probléma
Gradient descent based optimization relies heavily on the optimized model being locally linear in their parameters. However, deep neural network and transformer architectures often represent complex, non-linear functions where parameters and gradients may be temporarily correlated. Empirical research on the exact behavior of such models can be an interesting topic. Another phenomenon connected to deep learning is the double descent problem, where certain models generalize considerably better when they are vastly overparameterized compared to the amount of data available. Leading theories suggest a close connection with the conditionality of the problem.
Tasks that can be done:
- Experiments regarding the local linearity of different models, at different stages of training, especially near the step size used by the optimizer.
- Examining the existence and the effect of correlations between model parameters and gradients.
- Construction of experiments where double descent occurs. Examining how different configurations (models, datasets, training parameters) affect generalization.
A combination of tasks is also possible, as well as the student's own ideas regarding the topic.
Necessary competencies:
Motivation for a deep understanding of how neural networks learn. Interest in the mathematical background machine learning and deep learning (linear algebra, probability theory, analysis). Being prepared to handle the unexpected challenges that arise during research.
Benefits for the student:
By working on this topic, the student can gain a deep theoretical understanding of the inner workings of neural networks, with particular emphasis on their optimization, local linearity, and generalization. With sufficient results, the research may lead to the preparation of a TDK paper and presentation at the student scientific conference. The topic is suitable for an individual laboratory project, or a BSc/MSc thesis.
Tumay Ádám
doktorandusz
tumay
BME-MIT