Arthur De Los Santos
MIT MGAIC | MIT Generative AI Impact Research and Innovation Scholar
Physical AI: Robust Vision-Language Navigation
2025–2026
Electrical Engineering and Computer Science
- Generative AI
- AI and Machine Learning
- Robotics
Daniela L. Rus
Vision-Language Navigation links natural language and robotic navigation, enabling agents to follow instructions in complex environments. Existing frameworks like MiniNav achieve this efficiently by fusing frozen vision-language features with lightweight policy heads, but use static plans that fail to adapt to scene changes or ambiguous intent. We propose a minimalist VLN system for real-world deployment that integrates real-time formal reasoning modules to validate plan step feasibility and goal-adherence, and interactive feedback mechanisms to request clarifications or suggest alternatives when confidence is low. This preserves computational efficiency while enabling adaptive, cooperative disambiguation in dynamic settings — paving the way for scalable, trustworthy human-robot interaction.
I chose to participate in a SuperUROP to enhance my problem-solving skills and apply them to cutting-edge research in my field. My two prior ML-based UROPs and my work at Telexistence this past summer (perception for autonomous warehouse robots) have prepared me for this research. I aim to deepen my knowledge of physical AI and the full research cycle, and I am most excited to see our work realized in a physical system.
