Skip to yearly menu bar Skip to main content


In-Person Poster presentation / poster accept

Exploring perceptual straightness in learned visual representations

Anne Harrington · Vasha DuTell · Ayush Tewari · Mark Hamilton · Simon Stent · Ruth Rosenholtz · William Freeman

MH1-2-3-4 #109

Keywords: [ Neuroscience and Cognitive Science ] [ computer vision ] [ deep learning ] [ representation learning ] [ human vision ] [ neuroscience ] [ adversarial robustness ]


Abstract:

Humans have been shown to use a ''straightened'' encoding to represent the natural visual world as it evolves in time (Henaff et al. 2019). In the context of discrete video sequences, ''straightened'' means that changes between frames follow a more linear path in representation space at progressively deeper levels of processing. While deep convolutional networks are often proposed as models of human visual processing, many do not straighten natural videos. In this paper, we explore the relationship between network architecture, differing types of robustness, biologically-inspired filtering mechanisms, and representational straightness in response to time-varying input; we identify strengths and limitations of straightness as a useful way of evaluating neural network representations. We find that (1) adversarial training leads to straighter representations in both CNN and transformer-based architectures but (2) this effect is task-dependent, not generalizing to tasks such as segmentation and frame-prediction, where straight representations are not favorable for predictions; and nor to other types of robustness. In addition, (3) straighter representations impart temporal stability to class predictions, even for out-of-distribution data. Finally, (4) biologically-inspired elements increase straightness in the early stages of a network, but do not guarantee increased straightness in downstream layers of CNNs. We show that straightness is an easily computed measure of representational robustness and stability, as well as a hallmark of human representations with benefits for computer vision models.

Chat is not available.