MIT EECS | CS+HASS Undergraduate Research and Innovation Scholar
Investigating the Conceptual Limitations of Diffusion Models and its Philosophical Significance
Electrical Engineering and Computer Science
Antonio Torralba & Bradford Skow
Recent advances in computer vision have led to the creation of impressive text-guided image generators, which can create complex photorealistic imagery from simple text prompts; however, these models have been shown to display deficiencies in their understanding of basic relational and causal prompts. This paper explores which kinds of properties and concepts DALL-E, Midjourney, and Stable Diffusion succeed and fail at. We will validate our findings using an online survey experiment, where human raters will be asked to distinguish between subtly different prompts that relate to a certain concept/property that we are testing. Lastly, we will see what the philosophy of aesthetics can bring to bear on why DALL-E succeeds with certain concepts but fails with others.
I am very excited to be directing my own research project with the help of advisors. I wanted to do something that bridged my interest in the humanities and AI. Super UROP gave me the chance to work with a philosopher, computer scientist, and cognitive scientist on a novel interdisciplinary project! It’ s exciting to see how the fields relate and to find out whether diffusion models can shed light on the philosophy of aesthetics and vice versa.