Research Project Title:
Large-Scale Clinical Text Annotation
abstract:Electronic health records (EHRs) are the standard format for collection of clinical data and an important potential dataset for the application of artificial intelligence analysis. However, this data is largely unstructured and unlabeled, making the use of natural language processing (NLP) techniques difficult. The project goal is to create a platform to crowdsource a new structured and annotated EHR dataset. Based on current NLP techniques, the platform will optimally label mentions of clinical terms in the text. I will iterate with clinicians to develop an interactive machine learning-based interface that allows for efficient yet unbiased data collection, powering the platform's own labeling models. The intention is to enable many use cases of downstream clinical and NLP research.
"I am participating in SuperUROP to experience advanced research and use the academic skills I've gained in undergraduate classes, while producing concrete impact in a problem space that I'm passionate about. I have previous UROP experience in the Media Lab and have taken graduate level machine learning (ML) and human-computer interaction (HCI) courses. I hope to further my understanding of ML, HCI, and health care while enabling NLP research on clinical text."