Skip to yearly menu bar Skip to main content


(6 events)   Timezone:  
Show all
Toggle Poster Visibility
Oral
Sat Apr 26 12:30 AM -- 12:42 AM (PDT) @ Garnet 216-218 None
Accelerated training through iterative gradient propagation along the residual path
Erwan Fagnou · Paul Caillon · Blaise Delattre · Alexandre Allauzen
[ Slides [ OpenReview
Oral
Sat Apr 26 12:42 AM -- 12:54 AM (PDT) @ Garnet 216-218 None
Learning Randomized Algorithms with Transformers
Johannes von Oswald · Seijin Kobayashi · Yassir Akram · Angelika Steger
[ OpenReview
Oral
Sat Apr 26 12:54 AM -- 01:06 AM (PDT) @ Garnet 216-218 None
Attention as a Hypernetwork
Simon Schug · Seijin Kobayashi · Yassir Akram · Joao Sacramento · Razvan Pascanu
[ Slides [ OpenReview
Oral
Sat Apr 26 01:06 AM -- 01:18 AM (PDT) @ Garnet 216-218 None
Transformers Provably Solve Parity Efficiently with Chain of Thought
Juno Kim · Taiji Suzuki
[ OpenReview
Oral
Sat Apr 26 01:18 AM -- 01:30 AM (PDT) @ Garnet 216-218 None
When is Task Vector Provably Effective for Model Editing? A Generalization Analysis of Nonlinear Transformers
Hongkang Li · Yihua Zhang · shuai ZHANG · Pin-Yu Chen · Sijia Liu · Meng Wang
[ OpenReview
Oral
Sat Apr 26 01:30 AM -- 01:42 AM (PDT) @ Garnet 216-218 None
Progressive distillation induces an implicit curriculum
Abhishek Panigrahi · Bingbin Liu · Sadhika Malladi · Andrej Risteski · Surbhi Goel
[ OpenReview