Reasoning, Data-efficiency and Alignment in VLMs

Title: Reasoning, Data-Efficiency and Alignment in Vision-Language Models
Speaker: Dr. Aishwarya Agrawal | Assistant Professor Department of Computer Science & Operations Research University of Montreal Canada | CIFAR AI Chair Core academic member of Mila - Quebec AI Institute
Date: Thursday, Nov 27, 2025
Time: 4:00 PM
Location: Zoom Only (Click to Join)
Zoom Meeting ID: 960 3678 4364
Zoom Pascode: 416523

Summary:

Over the last decade, we have made tremendous progress in vision-language research. Multimodal large-language models can generate long captions about images and answer complex questions about them. Text-to-image models can generate beautiful images about complex concepts. Despite this progress, vision-language models still fail embarrassingly on several tasks such as counting objects in images, understanding spatial relationships between objects, representing different cultures accurately etc. Moreover, these models are trained on millions of samples, making it challenging for academic labs to train such models. In this talk, I will discuss the following open challenges and present the contributions my group has made towards tackling each challenge:

How do we improve the complex reasoning capabilities of image editing models?
How do we improve the data-efficiency of multimodal large language models?
How do we assess the degree of alignment of text-to-image models with human expectations?

Biography:

Aishwarya Agrawal is an Assistant Professor in the Department of Computer Science and Operations Research at University of Montreal. She is a Canada CIFAR AI Chair and a core academic member of Mila -- Quebec AI Institute. She also spends one day a week at Google DeepMind as a Research Scientist. From Aug 2019 - Dec 2020, Aishwarya was a full time Research Scientist at DeepMind. Aishwarya completed her PhD in Aug 2019 from Georgia Tech, working with Dhruv Batra and Devi Parikh. Aishwarya's research focus is on multimodal AI research, specifically vision-language research, spanning various themes such as image-to-text and text-to-image generative models, Visio-linguistic representation learning, compositional and fine-grained reasoning, parameter and data efficient learning, robust automatic evaluation, geo-diverse cultural understanding, and reliable, explainable and safe multimodal AI. Aishwarya is a recipient of the 2025 Mark Everingham Prize, a Canada CIFAR AI Chair Award, a Young Alumni Excellence Award from IIT Gandhinagar (her alma mater), a Georgia Tech Sigma Xi Best Ph.D. Thesis Award, a Georgia Tech College of Computing Dissertation Award, a Google Fellowship (declined), a Facebook Fellowship (declined) and an NVIDIA Graduate Fellowship. Aishwarya was one of the two runners-up of the 2019 AAAI / ACM SIGAI Dissertation Award. Aishwarya was also selected for the Rising Stars in EECS 2018.

Website: Aishwarya Agrawal's Personal Website

Reasoning, Data-efficiency and Alignment in VLMs

Summary:

Biography:

We use cookies on this site to enhance your experience.