Skip to yearly menu bar Skip to main content


Oral

Combatting Dimensional Collapse in LLM Pre-Training Data via Submodular File Selection

Ziqing Fan ⋅ Siyuan Du ⋅ Shengchao Hu ⋅ Pingjie Wang ⋅ Li Shen ⋅ Ya Zhang ⋅ Dacheng Tao ⋅ Yanfeng Wang
2025 Oral

Abstract

Video

Chat is not available.