Skip to yearly menu bar Skip to main content


Poster

Dynamic Layer Tying for Parameter-Efficient Transformers

Tamir David-Hay · Lior Wolf

Halle B #104
[ ]
Thu 9 May 1:45 a.m. PDT — 3:45 a.m. PDT

Abstract: In the pursuit of reducing the number of trainable parameters in deep transformer networks, we employ Reinforcement Learning to dynamically select layers during training and tie them together. Every few iterations, the RL agent is asked whether to train each layer $i$ independently or to copy the weights of a previous layer $j

Live content is unavailable. Log in and register to view live content