Oral
in
Workshop: Workshop on Spurious Correlation and Shortcut Learning: Foundations and Solutions
Distinct Computations Emerge From Compositional Curricula in In-Context Learning
Jin Hwa Lee · Andrew Lampinen · Aaditya Singh · Andrew Saxe
Keywords: [ curriculum learning ] [ compositional generalization ] [ in-context learning ]
In-context learning (ICL) typically presents a function through a uniform sample of input-output pairs. Here, we investigate how presenting a compositional subtask curriculum in context may alter the computations that the model learns. We design a compositional algorithmic task based on the modular exponential---a double exponential task composed of two single exponential subtasks---and train transformer models to learn the task in-context. We compare the model when trained (a) using an in-context curriculum consisting of single exponential subtasks and, (b) the model trained directly on the double exponential task without such a curriculum. We show that the model trained with a subtask curriculum can perform zero-shot inference on unseen compositional tasks and is more robust given the same context length. We study how the task is represented across the two training regimes, in particular whether subtask information is represented. We find that the model employs different mechanisms, possibly changing through training, in a way modulated by the data properties of the in-context curriculum.