David Pacheco
MIT EECS | Undergraduate Research and Innovation Scholar
Linguistic Analysis of Wikipedia for Question Answering
2018–2019
EECS
- Natural Language and Speech Processing
Boris Katz
Currently, Wikipedia is one of the biggest, if not the biggest, source of information portrayed in the form of both unrestricted text and linguistic values associated with certain attributes. Many of the important and classifying data that Wikipedia contains can be found in the first sentence of an article as well as its infoboxes. This tagged data can prove very useful when it comes to question answering and so our goal is to create a seamless question-answering system that can integrate Wikipedia infoboxes. Some of the steps needed to achieve this include: 1) creating a robust database retrieval and storage system; 2) deciding what is relevant to a particular question through decomposition; and 3) allowing our system to understand multiple references to the same object.
This SuperUROP experience will enable me to learn what it is like to carry out a research project for an entire year. I’ve taken 6.031 (Elements of Software Construction) and 6.806 (Advanced Natural Language Processing), so I’m excited to put what I’ve learned to the test. I also plan on doing a Master’ s in Engineering (MEng) degree, and SuperUROP will certainly prepare me for writing a research paper.