Influence Dynamics and Stagewise Data Attribution
Abstract
Current training data attribution (TDA) methods treat influence as static, ignoring the fact that neural networks learn in distinct stages. This stagewise development, driven by phase transitions on a degenerate loss landscape, means a sample's importance is not fixed but changes throughout training. In this work, we introduce a developmental framework for data attribution, grounded in singular learning theory. We predict that influence can change non-monotonically, including sign flips and sharp peaks at developmental transitions. We first confirm these predictions analytically and empirically in a toy model, showing that dynamic shifts in influence directly map to the model's progressive learning of a semantic hierarchy. Finally, we demonstrate these phenomena at scale in language models, where token-level influence changes align with known developmental stages.