Expanding the START Question Answering System with Dependency Parsing and Statistical Methods START is a web-based natural language question answering system developed at CSAIL InfoLab. Aiming for high precision START is largely rule-based and suffers from both low recall and the need for human annotation to build its knowledgebase. This project explores the use of dependency parsers to parse natural texts into START's ternary expression and automatically build the knowledgebase. Other natural language processing and machine learning methods will be examined as well. Matching algorithms will be developed to link the appropriate information in the knowledgebase to user's question. By fitting modern high recall technologies into START framework we aim to design a high precision QA system that is capable of indexing various data sources automatically.
I will be improving START Question Answering system using a statistical parser and other ML techniques. My interest has always been in NLP the intersection of computer science and linguistics. Question answering is a fun NLP problem because it is technically challenging and has a potential to change how we communicate with information. Through this project I hope to become both a better researcher and a better engineer.