Hannah Y. Gao

hanngao@mit.edu

Scholar Title

MIT EECS | Mason Undergraduate Research and Innovation Scholar

Research Title

Enabling Visual Thinking and Reasoning in Multimodal LLMs

Cohort

2025–2026

Department

Electrical Engineering and Computer Science

Research Areas

AI and Machine Learning

Supervisor

Antonio Torralba

torralba@csail.mit.edu

Abstract

When solving complex or multi-step problems, humans often benefit from visual aids for reasoning, including sketching diagrams or graphs. As visual diagramming can be a powerful tool for advanced reasoning, we aim to equip multimodal models with the ability to reason through free-form sketching. To enable multimodal models to create natural and versatile sequence-by-sequence sketches, I will help develop a suitable dataset for teaching multimodal models the desired sketching behavior, design a pipeline for incorporating sketching behavior into the model’s reasoning process, and train or fine-tune a multimodal model to perform visual-based reasoning. This model will serve as a new baseline for future work in visual chain-of-thought reasoning.

Quote

I am participating in SuperUROP because I have really enjoyed my time in AI classes and AI-related research projects at MIT, and I want to continue to build off the skills I have developed in these experiences through an immersive research experience. I am excited to learn more about state-of-the-art multimodal models and get a better sense of a career in academia.

Back to Scholars