Ishika S. Shah
Eric and Wendy Schmidt Center Funded Research and Innovation Scholar
Integrating Gene Expression Datasets Using Machine Learning
2022–2023
Electrical Engineering and Computer Science
- Artificial Intelligence for Healthcare and Life Sciences
Caroline Uhler
Breast cancer is the second most diagnosed type of cancer worldwide. However, little is currently known about its progression from the pre-invasive to the invasive stage. The goal of this project is to understand the mechanism of progression and to find good clinical markers that identify which cases will progress to the invasive stage. We will do so by developing models from imaging data. We plan to build an autoencoder to obtain unsupervised representations of the individual cells in the images. We aim to build the model such that the latent representation consists of existing hand-crafted features as well as orthogonal learned features that could identify new markers of tumor progression. We will also compare the latent representations of cells to determine which cell types are present.
Through this SuperUROP project, I want to gain more experience in the field of computational biology. I am excited to apply my machine learning knowledge (from 6.036 and previous research) and math background, and to combine that with my interest in the life sciences. I also want to learn more about the process of doing longer-term research through SuperUROP.