Skip to content
← Back to research

PhD Thesis · Defended April 13, 2023

Exploring the role of (self-)attention in cognitive and computer vision architecture

Doctoral dissertation at Brown University & ANITI (Université Paul Sabatier), under Prof. Thomas Serre and Prof. Nicholas Asher.

PhD thesis cover

Abstract

What the thesis is about

A fundamental mechanism of cognition needed to perform complex reasoning tasks is the ability to selectively process information (attention) and retain information in an accessible state (memory). We systematically analyze both components, starting with Transformer-based self-attention as a model of attention and later extending the architecture with memory — the Transformer being the de-facto architectural choice across modern AI.

We first study the computational mechanisms of a synthetic visual reasoning test (SVRT), analyzing ResNet architectures of varying depth and training-set size. This yields a novel, finer taxonomy of the twenty-three SVRT tasks, consistent with the same-different (SD) and spatial-relation (SR) classes of reasoning. We then incorporate self-attention into ResNet50 — as feature-based and spatial attention enriching the feature maps — and find these attention networks markedly more efficient at the hardest reasoning tasks, partially explaining the taxonomy and yielding testable predictions about attentional needs.

Finally, we develop GAMR — a Guided Attention Model for (visual) Reasoning — a cognitive architecture integrating attention and memory, motivated by the theory of active vision. GAMR solves complex visual reasoning via sequences of attention shifts that route task-relevant information into memory, guided by an internally generated query. It is sample-efficient, robust and compositional, and capable of zero-shot generalization to entirely novel reasoning tasks.

Defense committee

Examined by