Deep learning is the predominant machine learning paradigm in natural language processing (NLP). This approach not only gave huge performance improvements across a large variety of natural language processing tasks. In this version of the seminar we want to focus on a deeper understanding of transformer LMs and recent new developments in the area of language modelling.
Lecturer: Dietrich Klakow
Location: t.d.b
Time: block course in the fall break 2024 however preparations start earlier. Here the specific time line
Closing topic doodle: May 12th 23:59
Kick-Off: around May 21-24. Exact time to be determined by a doodle
One page outline: June 16th 23:59
Draft presentation: July 15th 23:59
Practice talks and final talks will be in September. Time/date will be decided during the kick-off.
Application for participation: registration system. (this is for everybody, CoLi, CS, DSAI, ….). Before applying check please check the list of topics and the planed schedule. Apply latest by Sunday April 21st. Number is participants is limited to 12. Notification of acceptance on April 22nd.
HISPOS registration deadline: tbd
Grading (tentative):
- 5% one page talk outline
- 10% draft presentation
- 10% practice talk
- 35% final talk
- 5% contributions to discussion during final talk of fellow participants
- 35% report
List of Topics (tentative):
- Survey of Large Language Models (2 talks)
- Challenges and Applications of Large Language Models
- What learning algorithm is in-context learning? Investigations with linear models
- One Step of Gradient Descent is Provably the Optimal In-Context Learner with One Layer of Linear Self-Attention
- Transformers Learn Shortcuts to Automata
- Transformers learn through gradual rank increase
- Auto-Regressive Next-Token Predictors are Universal Learners
- When can transformers reason with abstract symbols?
- Geometric Dynamics of Signal Propagation Predict Trainability of Transformers
- Hungry Hungry Hippos: Towards Language Modeling with State Space Models
- A Survey on Generative Diffusion Models
- Diffusion generative models
- Stable Diffusion application: Illuminating protein space with a programmable generative model (2 talks possible)