Hannah Y. Gao
Enabling Visual Thinking and Reasoning in Multimodal LLMs
2025–2026
Electrical Engineering and Computer Science
- AI and Machine Learning
Torralba, Antonio
When solving complex or multi-step problems, humans often benefit from visual aids for reasoning, including sketching diagrams or graphs. As visual diagramming can be a powerful tool for advanced reasoning, we aim to equip multimodal models with the ability to reason through free-form sketching. To enable multimodal models to create natural and versatile sequence-by-sequence sketches, I will help develop a suitable dataset for teaching multimodal models the desired sketching behavior, design a pipeline for incorporating sketching behavior into the model’s reasoning process, and train or fine-tune a multimodal model to perform visual-based reasoning. This model will serve as a new baseline for future work in visual chain-of-thought reasoning.
I am participating in SuperUROP because I have really enjoyed my time in AI classes and AI-related research projects at MIT, and I want to continue to build off the skills I have developed in these experiences through an immersive research experience. I am excited to learn more about state-of-the-art multimodal models and get a better sense of a career in academia.
