Arthur X. Liang
MIT EECS | Takeda Undergraduate Research and Innovation Scholar
Substructure-aware Protein Representation Learning For Reasoning Over Proteins With Large Language Models
2024–2025
Electrical Engineering and Computer Science
- AI for Healthcare and Life Sciences
Manolis Kellis
We focus on harnessing the high-level reasoning capabilities of large language models to accelerate scientific discovery of proteins. In particular, we train large language models to utilize protein embeddings generated by state-of-the-art protein sequence and structure encoders. By infusing a large language model with the ability to directly reason over this richer representation of protein function, we produce a model that has both the flexibility and generalizability of a natural language interface as well as the grounding in fundamental biology that comes from protein sequence and structure rather than arbitrary gene names. With this model, scientists will be able to answer complex queries over the protein space as well as engage in hypothesis generation.
I am participating in SuperUROP to deepen my research experience and contribute to cutting-edge advancements in AI. With my background in machine learning, particularly in representation learning and computational neuroscience, I hope to explore new methodologies, collaborate with experts, and refine my problem-solving skills.