Arthur X. Liang

artliang@mit.edu

Scholar Title

MIT EECS | Takeda Undergraduate Research and Innovation Scholar

Research Title

Substructure-aware Protein Representation Learning For Reasoning Over Proteins With Large Language Models

Cohort

2024–2025

Department

Electrical Engineering and Computer Science

Research Areas

AI for Healthcare and Life Sciences

Supervisor

Manolis Kellis

manoli@mit.edu

Abstract

We focus on harnessing the high-level reasoning capabilities of large language models to accelerate scientific discovery of proteins. In particular, we train large language models to utilize protein embeddings generated by state-of-the-art protein sequence and structure encoders. By infusing a large language model with the ability to directly reason over this richer representation of protein function, we produce a model that has both the flexibility and generalizability of a natural language interface as well as the grounding in fundamental biology that comes from protein sequence and structure rather than arbitrary gene names. With this model, scientists will be able to answer complex queries over the protein space as well as engage in hypothesis generation.

Quote

I am participating in SuperUROP to deepen my research experience and contribute to cutting-edge advancements in AI. With my background in machine learning, particularly in representation learning and computational neuroscience, I hope to explore new methodologies, collaborate with experts, and refine my problem-solving skills.

Back to Scholars