Toggle Poster Visibility
Oral
Sat Apr 26 12:30 AM -- 12:42 AM (PDT) @ Garnet 216-218 None
Accelerated training through iterative gradient propagation along the residual path
[
Slides]
[
OpenReview]
Oral
Sat Apr 26 12:42 AM -- 12:54 AM (PDT) @ Garnet 216-218 None
Learning Randomized Algorithms with Transformers
[
OpenReview]
Oral
Sat Apr 26 12:54 AM -- 01:06 AM (PDT) @ Garnet 216-218 None
Attention as a Hypernetwork
[
Slides]
[
OpenReview]
Oral
Sat Apr 26 01:06 AM -- 01:18 AM (PDT) @ Garnet 216-218 None
Transformers Provably Solve Parity Efficiently with Chain of Thought
[
OpenReview]
Oral
Sat Apr 26 01:18 AM -- 01:30 AM (PDT) @ Garnet 216-218 None
When is Task Vector Provably Effective for Model Editing? A Generalization Analysis of Nonlinear Transformers
[
OpenReview]
Oral
Sat Apr 26 01:30 AM -- 01:42 AM (PDT) @ Garnet 216-218 None
Progressive distillation induces an implicit curriculum
[
OpenReview]
Successful Page Load