Research Project Title:
Developing an Ensemble Model for De-Identification of Clinical Text
abstract:Sharing patient data for research often relies on deidentification, the process of removing identifiers to protect patient privacy. Deidentification presents challenges due to heterogeneous target identifiers and scarce annotated data for developing models. My project has two foci: to create a model combining LCPSQs machine learning and pattern matching approaches to deidentification of clinical text, and to assess the performance of this model on conversations between patients and caregivers. First, I would combine the deidentification techniques into an open-source package. Next, I would aim to measure and potentially improve the performance of the model based on ground truth labels set by HIPAA guidelines. Doing so could open an area of clinical text for wider circulation in research.
Through this SuperUROP, I aim to gain experience with machine learning applied in a clinical context. I took 6.036 (Intro to Machine Learning) and 6.802 (Computational Systems Biology), and I enjoyed selecting and developing models to address intriguing research questions. I'm excited to build on these learnings with a research project, during which I hope to gain more fluency with machine learning approaches and research with clinical text.