Research

Publications

Work on visual reasoning, vision-language models, attention & memory architectures, and video coding — peer-reviewed at NeurIPS, ICLR, Neural Computation and more.

Publications

149+

Citations

Google Scholar

PhD Thesis — Exploring the role of (self-)attention in cognitive & computer vision

Brown University / ANITI, 2023 · committee from Princeton, DeepMind & CentraleSupélec →

journal

2026 1 citations

Beyond the Linear Separability Ceiling: Aligning Representations in Vision-Language Models

Mohit Vaishnav, et al.

Transactions on Machine Learning Research (TMLR)

arXiv

conference

2026

Symbolic Grounding Reveals Representational Bottlenecks in Abstract Visual Reasoning

Mohit Vaishnav, et al.

Conference on Computational Natural Language Learning (CoNLL)

Scholar

workshop

2026

Concept Cues Reshape Human Verification in Bongard-LOGO Visual Reasoning

Mohit Vaishnav, et al.

CVPR Workshop on Knowledge-Intensive Multimodal Reasoning

Scholar

preprint

2025 4 citations

A Cognitive Paradigm Approach to Probe the Perception–Reasoning Interface in VLMs

Mohit Vaishnav, et al.

arXiv preprint arXiv:2501.13620

arXiv

preprint

2025

Not How You Think, It's What You See: Decoupling Perception from Reasoning

Mohit Vaishnav, et al.

Working paper

Scholar

conference

2023 24 citations

GAMR: A Guided Attention Model for (Visual) Reasoning

Mohit Vaishnav, Thomas Serre

International Conference on Learning Representations (ICLR)

A novel module for visual reasoning that instantiates an active-vision theory — solving complex visual reasoning dynamically via sequences of attention shifts that route task-relevant information into memory. GAMR learns visual routines robustly and sample-efficiently, and generalizes zero-shot to novel reasoning tasks.

Paper

thesis

2023 2 citations

PhD Thesis: Exploring the Role of (Self-)Attention in Cognitive and Computer Vision Architecture

Mohit Vaishnav

Université Paul Sabatier (Toulouse III) / Brown University / ANITI

Investigates the role of attention and memory in complex reasoning. Analyses Transformer self-attention, extends it with memory, refines the taxonomy of SVRT reasoning tasks, and proposes GAMR — a cognitive architecture combining attention and memory inspired by active-vision theory.

arXiv

conference

2022 82 citations

A Benchmark for Compositional Visual Reasoning

Aimen Zerroug, Mohit Vaishnav, Julien Colin, Sebastian Musslick, Thomas Serre

NeurIPS — Datasets and Benchmarks Track

Introduces Compositional Visual Relations (CVR), a benchmark driving progress toward more data-efficient visual reasoning. Provides measures of sample efficiency, generalization and compositionality, and finds modern models remain far less data-efficient than humans.

arXiv

journal

2022 26 citations

Understanding the Computational Demands Underlying Visual Reasoning

Mohit Vaishnav, Remi Cadene, Andrea Alamia, Drew Linsley, Rufin VanRullen, Thomas Serre

Neural Computation (MIT Press)

Characterizes the computational demands of abstract visual reasoning by assessing CNNs on the SVRT challenge, revealing a novel taxonomy of reasoning tasks and showing how spatial vs. feature-based attention selectively helps the hardest tasks.

Paper

preprint

2022 6 citations

Conviformers: Convolutionally Guided Vision Transformer

Mohit Vaishnav, Thomas Fel, Ivan Felipe Rodríguez, Thomas Serre

arXiv preprint arXiv:2208.08900

A convolutional transformer architecture that handles higher-resolution images without exploding memory/compute, with a PreSizer pre-processing technique — achieving state-of-the-art on Herbarium 202x and iNaturalist 2019.

arXiv

conference

2022