Beyond the Linear Separability Ceiling: Aligning Representations in Vision-Language Models
Mohit Vaishnav, et al.
Transactions on Machine Learning Research (TMLR)
Research
Work on visual reasoning, vision-language models, attention & memory architectures, and video coding — peer-reviewed at NeurIPS, ICLR, Neural Computation and more.
PhD Thesis — Exploring the role of (self-)attention in cognitive & computer vision
Brown University / ANITI, 2023 · committee from Princeton, DeepMind & CentraleSupélec →
Mohit Vaishnav, et al.
Transactions on Machine Learning Research (TMLR)
Mohit Vaishnav, et al.
Conference on Computational Natural Language Learning (CoNLL)
Mohit Vaishnav, et al.
CVPR Workshop on Knowledge-Intensive Multimodal Reasoning
Mohit Vaishnav, et al.
arXiv preprint arXiv:2501.13620
Mohit Vaishnav, et al.
Working paper
Mohit Vaishnav, Thomas Serre
International Conference on Learning Representations (ICLR)
A novel module for visual reasoning that instantiates an active-vision theory — solving complex visual reasoning dynamically via sequences of attention shifts that route task-relevant information into memory. GAMR learns visual routines robustly and sample-efficiently, and generalizes zero-shot to novel reasoning tasks.
Mohit Vaishnav
Université Paul Sabatier (Toulouse III) / Brown University / ANITI
Investigates the role of attention and memory in complex reasoning. Analyses Transformer self-attention, extends it with memory, refines the taxonomy of SVRT reasoning tasks, and proposes GAMR — a cognitive architecture combining attention and memory inspired by active-vision theory.
Aimen Zerroug, Mohit Vaishnav, Julien Colin, Sebastian Musslick, Thomas Serre
NeurIPS — Datasets and Benchmarks Track
Introduces Compositional Visual Relations (CVR), a benchmark driving progress toward more data-efficient visual reasoning. Provides measures of sample efficiency, generalization and compositionality, and finds modern models remain far less data-efficient than humans.
Mohit Vaishnav, Remi Cadene, Andrea Alamia, Drew Linsley, Rufin VanRullen, Thomas Serre
Neural Computation (MIT Press)
Characterizes the computational demands of abstract visual reasoning by assessing CNNs on the SVRT challenge, revealing a novel taxonomy of reasoning tasks and showing how spatial vs. feature-based attention selectively helps the hardest tasks.
Mohit Vaishnav, Thomas Fel, Ivan Felipe Rodríguez, Thomas Serre
arXiv preprint arXiv:2208.08900
A convolutional transformer architecture that handles higher-resolution images without exploding memory/compute, with a PreSizer pre-processing technique — achieving state-of-the-art on Herbarium 202x and iNaturalist 2019.
Mohit Vaishnav, et al.
Geological Society of America — Abstracts with Programs
Mohit Vaishnav, Anil Kumar Tiwari
Data Compression Conference (DCC), IEEE
Mohit Vaishnav, Dinesh Kumar Chobey, Anil Kumar Tiwari
Indian Conference on Computer Vision, Graphics and Image Processing (ICVGIP), ACM
Mohit Vaishnav, Binny Tewani, Anil Kumar Tiwari
Data Compression Conference (DCC), IEEE
Dinesh Kumar Chobey, Mohit Vaishnav, Anil Kumar Tiwari
Data Compression Conference (DCC), IEEE
Mohit Vaishnav, Ashwani Sharma, Anil Kumar Tiwari
Data Compression Conference (DCC), IEEE Computer Society