MIT EECS - Quanta Computer Undergraduate Research and Innovation Scholar
Arabic Spoken Language Processing
This project involves research in Arabic spoken language understanding to support conversational human machine interaction using a mobile geographical application that supports tourist-type queries. The project will involve several crowdsourcing scenarios both to collect Arabic speech and text data queries, data labeling to support training of a stochastic semantic tagging model, and system deployment. Once sufficient data have been collected, we will develop appropriate linguistic features for successful semantic tagging using a conditional random field (CRF) model, and measure classification accuracies on held-out test data. Finally, we will integrate the semantic tagger into a prototype web-based geographical browser, and evaluate its performance with new users in a mobile environment.
I started working with Dr. Glass in February to deploy an Arabic semantic tagger by collecting training data by crowdsourcing. I created an Arabic crowdsourcing platform that was deployed on Amazon Mechanical Turk and an independent SLS crowdsourcing platform. I also have extensive web development experience that helped me work with those crowdsourcing technologies. Last semester I also was a listener in Dr. Glass’ class in Automatic Speech Recognition.