Arthur De Los Santos
MIT MGAIC | MIT Generative AI Impact Research and Innovation Scholar
Physical AI: Robust Vision-Language Navigation
2025–2026
Electrical Engineering and Computer Science
- Generative AI
- AI and Machine Learning
- Robotics
Rus, Daniela L.
Vision-Language Navigation links natural language and robotic navigation, enabling agents to follow instructions in complex environments. Existing frameworks like MiniNav achieve this efficiently by fusing frozen vision-language features with lightweight policy heads, but use static plans that fail to adapt to scene changes or ambiguous intent. We propose a minimalist VLN system for real-world deployment that integrates real-time formal reasoning modules to validate plan step feasibility and goal-adherence, and interactive feedback mechanisms to request clarifications or suggest alternatives when confidence is low. This preserves computational efficiency while enabling adaptive, cooperative disambiguation in dynamic settings—paving the way for scalable, trustworthy human-robot interaction.
I chose to participate in a SuperUROP to enhance my problem-solving skills and apply them to cutting-edge research in my field. My two prior ML-based UROPs and my work at Telexistence this past summer (perception for autonomous warehouse robots) have prepared me for this research. I aim to deepen my knowledge of physical AI and the full research cycle, and I am most excited to see our work realized in a physical system.
