Arthur X. Liang

Arthur X. Liang

Research Title

Reasoning Over Proteins With Large Language Models

Cohort

2024–2025

Department

Electrical Engineering and Computer Science

Research Areas
  • AI for Healthcare and Life Sciences
Supervisor

Manolis Kellis

Abstract

We focus on harnessing the high-level reasoning capabilities of large language models to accelerate scientific discovery of proteins. In particular, we train large language models to utilize protein embeddings generated by state-of-the-art protein sequence and structure encoders. By infusing a large language model with the ability to directly reason over this richer representation of protein function, we produce a model that has both the flexibility and generalizability of a natural language interface as well as the grounding in fundamental biology that comes from protein sequence and structure rather than arbitrary gene names. With this model, scientists will be able to answer complex queries over the protein space as well as engage in hypothesis generation.

Quote

I am participating in SuperUROP to deepen my research experience and contribute to cutting-edge advancements in AI. With my background in machine learning, particularly in representation learning and computational neuroscience, I hope to explore new methodologies, collaborate with experts, and refine my problem-solving skills.

Back to Scholars