MIT EECS — Duke Energy Undergraduate Research and Innovation Scholar
Cross-Lingual Sharing for Part of Speech Induction and Typological Prediction
Regina A. Barzilay
The goal of unsupervised part of speech induction is to learn parts of speech for words in a language where only raw unlabeled text is given. Typically, part of speech induction results in a set of groupings for words that roughly correspond to a part of speech grouping, but current methods are unable to actually label the groupings with a specific part of speech category. Our goal is to be able to learn categories with specific labels by learning properties of the part of speech categories in other languages where labeled data is available. Using these labels, we will be able to complete tasks that could not be done with methods that only form groups, such as determining linguistic properties of languages from raw text.
I have been doing research in natural language processing for the past year in a UROP, and I will continue to work with the same advisor for my SuperUROP. The SuperUROP will be a chance to work on a long-term project and to learn about some different topics. In particular, I will get experience with semi-supervised machine learning, which is a topic I find very interesting.