Skip to yearly menu bar Skip to main content


Poster

Exploring Target Representations for Masked Autoencoders

xingbin liu · Jinghao Zhou · Tao Kong · Xianming Lin · Rongrong Ji

Halle B #131
[ ] [ Project Page ]
Fri 10 May 1:45 a.m. PDT — 3:45 a.m. PDT

Abstract:

Masked autoencoders have become popular training paradigms for self-supervised visual representation learning. These models randomly mask a portion of the input and reconstruct the masked portion according to assigned target representations. In this paper, we show that a careful choice of the target representation is unnecessary for learning good visual representation since different targets tend to derive similarly behaved models. Driven by this observation, we propose a multi-stage masked distillation pipeline and use a randomly initialized model as the teacher, enabling us to effectively train high-capacity models without any effort to carefully design the target representation. On various downstream tasks, the proposed method to perform masked knowledge distillation with bootstrapped teachers (dbot) outperforms previous self-supervised methods by nontrivial margins. We hope our findings, as well as the proposed method, could motivate people to rethink the roles of target representations in pre-training masked autoencoders.

Live content is unavailable. Log in and register to view live content