Hannah Y. Gao

Hannah Y. Gao

Research Title

Enabling Visual Thinking and Reasoning in Multimodal LLMs

Cohort

2025–2026

Department

Electrical Engineering and Computer Science

Research Areas
  • AI and Machine Learning
Supervisor

Torralba, Antonio

Abstract

When solving complex or multi-step problems, humans often benefit from visual aids for reasoning, including sketching diagrams or graphs. As visual diagramming can be a powerful tool for advanced reasoning, we aim to equip multimodal models with the ability to reason through free-form sketching. To enable multimodal models to create natural and versatile sequence-by-sequence sketches, I will help develop a suitable dataset for teaching multimodal models the desired sketching behavior, design a pipeline for incorporating sketching behavior into the model’s reasoning process, and train or fine-tune a multimodal model to perform visual-based reasoning. This model will serve as a new baseline for future work in visual chain-of-thought reasoning.

Quote

I am participating in SuperUROP because I have really enjoyed my time in AI classes and AI-related research projects at MIT, and I want to continue to build off the skills I have developed in these experiences through an immersive research experience. I am excited to learn more about state-of-the-art multimodal models and get a better sense of a career in academia.

Back to Scholars