Greycen Ren
MIT MGAIC | MIT Generative AI Impact Research and Innovation Scholar
Structural Elucidation Through Self-Supervised Learning of Molecular and Spectral Representations
2025–2026
Electrical Engineering and Computer Science; Mathematics
- Generative AI
Coley, Connor W.
Complete metabolic studies require the ability to systematically label and quantify the entire collection of molecules within a sample. Current labeling methodologies with liquid chromatography-tandem mass spectrometry (LC-MS/MS) require comparison to a reference library, providing poor coverage over the chemical space. This project hopes to advance the computational metabolomics pipeline by developing a bimodal foundation model. By leveraging vast corpora of unlabeled data, we aim to learn rich spectral and molecular representations through self-supervised learning. Then, using labeled spectral/molecular pairs as supervision, we hope to align the latent space across the two modalities, enabling robust molecule-to-spectrum fragmentation and spectrum-to-molecule generation.
I am participating in SuperUROP to gain a more intensive research experience at the intersection of machine learning and the biomedical sciences. Though I have explored machine learning through my coursework, my background is primarily in biochemistry. This project excites me as an opportunity for computational approaches to make a meaningful contribution to modern problems in biology research, such as metabolomics.
