David Pacheco

dpacheco@mit.edu

Scholar Title

MIT EECS | Undergraduate Research and Innovation Scholar

Research Title

Linguistic Analysis of Wikipedia for Question Answering

Cohort

2018–2019

Department

EECS

Research Areas

Natural Language and Speech Processing

Supervisor

Boris Katz

boris@mit.edu

Abstract

Currently, Wikipedia is one of the biggest, if not the biggest, source of information portrayed in the form of both unrestricted text and linguistic values associated with certain attributes. Many of the important and classifying data that Wikipedia contains can be found in the first sentence of an article as well as its infoboxes. This tagged data can prove very useful when it comes to question answering and so our goal is to create a seamless question-answering system that can integrate Wikipedia infoboxes. Some of the steps needed to achieve this include: 1) creating a robust database retrieval and storage system; 2) deciding what is relevant to a particular question through decomposition; and 3) allowing our system to understand multiple references to the same object.

Quote

This SuperUROP experience will enable me to learn what it is like to carry out a research project for an entire year. I’ve taken 6.031 (Elements of Software Construction) and 6.806 (Advanced Natural Language Processing), so I’m excited to put what I’ve learned to the test. I also plan on doing a Master’ s in Engineering (MEng) degree, and SuperUROP will certainly prepare me for writing a research paper.

Back to Scholars