Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Mathematical and Empirical Understanding of Foundation Models (ME-FoMo)

The Effects of Pretraining Task Diversity on In-Context Learning of Ridge Regression

Allan Raventos · Mansheej Paul · Feng Chen · Surya Ganguli

Keywords: [ transformer ] [ in-context learning ] [ bayesian inference ]


Abstract:

Pretrained transformers can do in-context learning (ICL), i.e. learn new tasks in the forward pass from a few examples provided in context. But can the model do ICL for completely new tasks or is this ability restricted to tasks similar to those seen during pretraining? How does the diversity of tasks seen during pretraining affect the model's ability to do ICL? In the setting of ICL for ridge regression, we show that, if pretrained on few tasks sampled from a latent distribution, the model behaves like the Bayesian estimator with a prior equal to the discrete distribution over the sampled tasks. But if pretrained on a sufficiently large number of tasks, the model behaves like the Bayesian estimator with prior equal to the underlying latent distribution over tasks. Our results suggest that, as the diversity of the pretraining dataset increases, the model transitions from doing ICL on tasks similar to ones seen during pretraining to learning the underlying task structure and doing ICL on new tasks.

Chat is not available.