Shulammite Lim

shulim@mit.edu

Scholar Title

MIT EECS | Philips Undergraduate Research and Innovation Scholar

Research Title

Detecting Racial Bias in Clinical Text De-Identification

Cohort

2021–2022

Department

EECS

Research Areas

Natural Language and Speech Processing

Supervisor

Mark Roger

rgmark@mit.edu

Abstract

Sharing patient data for research often relies on deidentification, the process of removing identifiers to protect patient privacy. Deidentification presents challenges due to heterogeneous target identifiers and scarce annotated data for developing models. My project has two foci: to create a model combining LCPSQs machine learning and pattern matching approaches to deidentification of clinical text, and to assess the performance of this model on conversations between patients and caregivers. First, I would combine the deidentification techniques into an open-source package. Next, I would aim to measure and potentially improve the performance of the model based on ground truth labels set by HIPAA guidelines. Doing so could open an area of clinical text for wider circulation in research.

Quote

Through this SuperUROP, I aim to gain experience with machine learning applied in a clinical context. I took 6.036 (Intro to Machine Learning) and 6.802 (Computational Systems Biology), and I enjoyed selecting and developing models to address intriguing research questions. I’m excited to build on these learnings with a research project, during which I hope to gain more fluency with machine learning approaches and research with clinical text.

Back to Scholars