Rahul Yesantharao
MIT EECS Undergraduate Research and Innovation Scholar
Python Data Processing Framework with Native Execution Speed
2020–2021
EECS
- Computer Systems
Tim Kraska
Modern data processing systems for Python have to treat Python user-defined functions (UDFs) as black-boxes because Python’ s dynamic typing model makes it difficult to statically analyze and compile them. As such, these UDFs are slow, due to the interpreter overhead, and they act as optimization barriers within larger processing pipelines. We developed the system Tuplex to address these issues and allow for data processing in Python at native execution speed. Some specific areas I am looking at include improving logical optimizations and developing more complete operator support within Tuplex, as well as furthering its distributed execution capabilities.
Throughout my time at MIT, I have been drawn to computer systems as well as inference as I learned more about them. My previous UROP helped me explore my interest in systems engineering, and made me eager to gain as much experience with it as possible. Through this SuperUROP, I am excited to learn more about full system design and how theory can be implemented in real, useful systems.