Skip to content

Research

Publications

Work on visual reasoning, vision-language models, attention & memory architectures, and video coding — peer-reviewed at NeurIPS, ICLR, Neural Computation and more.

16

Publications

149+

Citations

PhD Thesis — Exploring the role of (self-)attention in cognitive & computer vision

Brown University / ANITI, 2023 · committee from Princeton, DeepMind & CentraleSupélec →

journal
2026 1 citations

Beyond the Linear Separability Ceiling: Aligning Representations in Vision-Language Models

Mohit Vaishnav, et al.

Transactions on Machine Learning Research (TMLR)

conference
2026

Symbolic Grounding Reveals Representational Bottlenecks in Abstract Visual Reasoning

Mohit Vaishnav, et al.

Conference on Computational Natural Language Learning (CoNLL)

workshop
2026

Concept Cues Reshape Human Verification in Bongard-LOGO Visual Reasoning

Mohit Vaishnav, et al.

CVPR Workshop on Knowledge-Intensive Multimodal Reasoning

preprint
2025 4 citations

A Cognitive Paradigm Approach to Probe the Perception–Reasoning Interface in VLMs

Mohit Vaishnav, et al.

arXiv preprint arXiv:2501.13620

preprint
2025

Not How You Think, It's What You See: Decoupling Perception from Reasoning

Mohit Vaishnav, et al.

Working paper

conference
2023 24 citations

GAMR: A Guided Attention Model for (Visual) Reasoning

Mohit Vaishnav, Thomas Serre

International Conference on Learning Representations (ICLR)

A novel module for visual reasoning that instantiates an active-vision theory — solving complex visual reasoning dynamically via sequences of attention shifts that route task-relevant information into memory. GAMR learns visual routines robustly and sample-efficiently, and generalizes zero-shot to novel reasoning tasks.

thesis
2023 2 citations

PhD Thesis: Exploring the Role of (Self-)Attention in Cognitive and Computer Vision Architecture

Mohit Vaishnav

Université Paul Sabatier (Toulouse III) / Brown University / ANITI

Investigates the role of attention and memory in complex reasoning. Analyses Transformer self-attention, extends it with memory, refines the taxonomy of SVRT reasoning tasks, and proposes GAMR — a cognitive architecture combining attention and memory inspired by active-vision theory.

conference
2022 82 citations

A Benchmark for Compositional Visual Reasoning

Aimen Zerroug, Mohit Vaishnav, Julien Colin, Sebastian Musslick, Thomas Serre

NeurIPS — Datasets and Benchmarks Track

Introduces Compositional Visual Relations (CVR), a benchmark driving progress toward more data-efficient visual reasoning. Provides measures of sample efficiency, generalization and compositionality, and finds modern models remain far less data-efficient than humans.

journal
2022 26 citations

Understanding the Computational Demands Underlying Visual Reasoning

Mohit Vaishnav, Remi Cadene, Andrea Alamia, Drew Linsley, Rufin VanRullen, Thomas Serre

Neural Computation (MIT Press)

Characterizes the computational demands of abstract visual reasoning by assessing CNNs on the SVRT challenge, revealing a novel taxonomy of reasoning tasks and showing how spatial vs. feature-based attention selectively helps the hardest tasks.

preprint
2022 6 citations

Conviformers: Convolutionally Guided Vision Transformer

Mohit Vaishnav, Thomas Fel, Ivan Felipe Rodríguez, Thomas Serre

arXiv preprint arXiv:2208.08900

A convolutional transformer architecture that handles higher-resolution images without exploding memory/compute, with a PreSizer pre-processing technique — achieving state-of-the-art on Herbarium 202x and iNaturalist 2019.

conference
2022

Using Artificial Intelligence to Identify Fossil Angiosperm Leaves at Family Level

Mohit Vaishnav, et al.

Geological Society of America — Abstracts with Programs

conference
2014 1 citations

Bin Classification Using Temporal Gradient Estimation for Lossless Video Coding

Mohit Vaishnav, Anil Kumar Tiwari

Data Compression Conference (DCC), IEEE

conference
2014

Temporal Stationarity Based Prediction Method for Lossless Video Coding

Mohit Vaishnav, Dinesh Kumar Chobey, Anil Kumar Tiwari

Indian Conference on Computer Vision, Graphics and Image Processing (ICVGIP), ACM

conference
2014

Residue Coding Technique for Video Compression

Mohit Vaishnav, Binny Tewani, Anil Kumar Tiwari

Data Compression Conference (DCC), IEEE

conference
2013 1 citations

An Optimal Switched Adaptive Prediction Method for Lossless Video Coding

Dinesh Kumar Chobey, Mohit Vaishnav, Anil Kumar Tiwari

Data Compression Conference (DCC), IEEE

conference
2011 2 citations

A Novel Computationally Efficient Motion Compensation Method Based on Pixel by Pixel Prediction

Mohit Vaishnav, Ashwani Sharma, Anil Kumar Tiwari

Data Compression Conference (DCC), IEEE Computer Society