Affinity Posters
Blog Track Session 8
David Dobre · Leo Schwinn · Claire Vernade · Charlie Gauthier · Fabian Pedregosa · Gauthier Gidel
Halle B
Live content is unavailable. Log in and register to view live content
Schedule
Fri 7:30 a.m. - 9:30 a.m.
|
A New Alchemy: Language Model Development as a Subfield?
(
Poster
#3
)
>
link
Poster Location: Halle B #3 This blog post makes the case that the body of research on language models become sufficiently large and mature that we can start thinking about “language model development” as a new subfield. To support this claim, we sketch out the focuses and methodologies of this new subfield. In addition, we provide some personal reflections on what to do when your field of study gives birth to a new one. |
Colin Raffel 🔗 |
Fri 7:30 a.m. - 9:30 a.m.
|
Double Descent Demystified: Identifying, Interpreting & Ablating the Sources of a Deep Learning Puzzle
(
Poster
#2
)
>
link
Poster Location: Halle B #2 Double descent is a surprising phenomenon in machine learning, in which as the number of model parameters grows relative to the number of data, test error drops as models grow ever larger into the highly overparameterized (data undersampled) regime. This drop in test error flies against classical learning theory on overfitting and has arguably underpinned the success of many large models in machine learning. In this work, we analytically dissect the simple setting of ordinary linear regression, and show intuitively and rigorously when and why double descent occurs, without complex tools (e.g., statistical mechanics, random matrix theory). We identify three interpretable factors that, when simultaneously all present, cause double descent: (1) How much the training features vary in each direction; (2) How much, and in which directions, the test features vary relative to the training features; (3) How well the best possible model in the model class can correlate the variance in the training features with the training targets. We demonstrate on real data that ordinary linear regression exhibits double descent, and that double descent disappears when we ablate any one of the three identified factors. We conclude by using our fresh perspective to shed light on recent observations in nonlinear models concerning superposition and double descent. |
Rylan Schaeffer · Zachary Robertson · Akhilan Boopathy · Mikail Khona · Kateryna Pistunova · Jason Rocks · Ila Fiete · Andrey Gromov · Sanmi Koyejo 🔗 |
Fri 7:30 a.m. - 9:30 a.m.
|
Understanding in-context learning in transformers
(
Poster
#1
)
>
link
Poster Location: Halle B #1 We propose a critical review on the phenomenon of In-Context Learning (ICL) in transformer architectures. Focusing on the article Transformers Learn In-Context by Gradient Descent by J. von Oswald et al., published in ICML 2023 earlier this year, we provide detailed explanations and illustrations of the mechanisms involved. We also contribute novel analyses on ICL, discuss recent developments and we point to open questions in this area of research. |
Simone Rossi · Rui Yuan · Thomas Hannagan 🔗 |