Cheuk Hei Chu
MIT MGAIC | MIT Generative AI Impact Research and Innovation Scholar
Mechanistic Interpretability of Modern Vision-language Models
2025–2026
Electrical Engineering and Computer Science; Mathematics
- Generative AI
- AI and Machine Learning
- Graphics and Vision
Isola, Phillip
Multimodal foundation models like CLIP have demonstrated powerful generalization capabilities across vision, language, and retrieval tasks. However, how these models represent concepts internally, and how information flows through their components, such as attention heads, MLPs, and residual streams, remains insufficiently understood.
Mechanistic interpretability seeks to move beyond input-output probing, aiming to reverse-engineer the internal computations of neural networks to understand how and why they produce certain behaviors. This project applies those principles to multimodal models, with the goal of understanding the storage of interpretable features and semantic directions across modalities, and studying the flow of multimodal information through model components.
I am participating in SuperUROP because I want to gain experience conducting serious research in computer vision. I took 6.7960 (Deep Learning) and 6.8300 (Advances in Computer Vision) last year and I was inspired by the cutting-edge advancements researchers have made in the last decade. I am excited to learn more about the field and contribute to an active field of research.
