
Hengzhi Li
Undergraduate Research and Innovation Scholar
Towards Socially-Intelligent Nonverbal Foundation Models
2024–2025
Electrical Engineering and Computer Science
- AI and Society
Paul Liang
Socially intelligent AI that can understand and interact seamlessly with humans in daily lives is increasingly important as AI becomes more closely integrated with peoples’ daily activities. However, current works in artificial social reasoning all rely on language-only, or language-dominant approaches to benchmark and training models, resulting in systems that are improving in verbal communication but struggle with nonverbal social understanding. To address this limitation, we tap into a novel source of data rich in nonverbal and social interactions — mime videos. Mimes refer to the art of expression through gesture and movement without spoken words, which presents unique challenges and opportunities in interpreting non-verbal social communication. We contribute a new dataset called MimeQA, obtained by sourcing videos from YouTube, through rigorous annotation and verification, resulting in a video question-answering benchmark. Using MimeQA, we evaluate state-of-the-art video large language models (vLLMs), and reveal their limitations in understanding imagined objects and subtle nonverbal interactions. We hope to inspire future work in foundation models that embody true social intelligence capable of interpreting non-verbal human interactions.
I am participating in the SuperUROP to gain thorough research experience. I have done several UROPs in the past, mostly joining existing projects. The SuperUROP allows me to independently explore a research question from start to finish, with the guidance of expert mentors and peers. I am excited to gain deeper insight into conducting research, build on existing works in the field, and hopefully push the boundaries a bit further!