In-Person Poster presentation / poster accept

Exploring perceptual straightness in learned visual representations

Anne Harrington ⋅ Vasha DuTell ⋅ Ayush Tewari ⋅ Mark Hamilton ⋅ Simon Stent ⋅ Ruth Rosenholtz ⋅ William Freeman

Keywords: adversarial robustness neuroscience human vision representation learning deep learning computer vision Neuroscience and Cognitive Science

2023 In-Person Poster presentation / poster accept

[ Poster] [ OpenReview]

Abstract

Humans have been shown to use a ''straightened'' encoding to represent the natural visual world as it evolves in time (Henaff et al. 2019). In the context of discrete video sequences, ''straightened'' means that changes between frames follow a more linear path in representation space at progressively deeper levels of processing. While deep convolutional networks are often proposed as models of human visual processing, many do not straighten natural videos. In this paper, we explore the relationship between network architecture, differing types of robustness, biologically-inspired filtering mechanisms, and representational straightness in response to time-varying input; we identify strengths and limitations of straightness as a useful way of evaluating neural network representations. We find that (1) adversarial training leads to straighter representations in both CNN and transformer-based architectures but (2) this effect is task-dependent, not generalizing to tasks such as segmentation and frame-prediction, where straight representations are not favorable for predictions; and nor to other types of robustness. In addition, (3) straighter representations impart temporal stability to class predictions, even for out-of-distribution data. Finally, (4) biologically-inspired elements increase straightness in the early stages of a network, but do not guarantee increased straightness in downstream layers of CNNs. We show that straightness is an easily computed measure of representational robustness and stability, as well as a hallmark of human representations with benefits for computer vision models.

Video

Chat is not available.