Lingjie Mei
MIT | IBM-Watson Undergraduate Research and Innovation Scholar
Zero-Shot Visual Concept Captioning
2019–2020
EECS
- Artificial Intelligence & Machine Learning
Joshua Tenenbaum
Humans use combined language and vision to determine and composite new concepts that they haven’ t learnt about previously. Many complex concepts are more easily explainable through a composition of previously learnt concepts. Here we aim to describe (caption) a new visual concept into preexisting learnt concepts without future training (zero-shot). This capability is a major component of childhood development.
From a cognitive scientist point of view, there exists different stages for cognitive development, where each stage a specific set of knowledge and reasoning module is employed. Children first develop intuitive psychology and intuitive physics — a theory about how objects move and interact. In the second stage, they receive language instructions from parents and can tell the concept of different class via visual clues. The third stage is to apply these languages and describe what they see via language. Our task is to bridge the gap between the second and the third stage of the cognitive development, which is to apply the magic of language compositionality on learnt concepts in the second stage.
“I am participating in SuperUROP to get a better view of how key modules and functions of a human mind are built, in terms of artificial intelligence. Through my SuperUROP project, I hope to make some technical contribution to the research domain and get a better understanding of the right way to bridge vision and language.