Thomas Cobley
Undergraduate Research and Innovation Scholar
Knowledge-Enhanced Protein Language Models for Therapeutics
2022–2023
Electrical Engineering and Computer Science
- Artificial Intelligence for Healthcare and Life Sciences
Manolis Kellis
The vast majority of common diseases are caused by complex combinations of genetic and environmental factors, interacting in genomic pathways that make it difficult to effectively treat the disease. A better understanding of these pathways would open the door for improved identification of therapeutics, potentially transforming countless lives.
In this work, we aim to improve understanding of disease-causing pathways through the application of self-supervised deep learning techniques which have led to incredible progress in many areas of AI in recent years. By enabling learning from massive unlabelled datasets, these techniques provide a mechanism for generating semantic, compact representations (embeddings) of input protein sequences which are massively enabling for a wide range of downstream tasks.
We build on existing work by integrating additional biological information into learned representations, with a focus on utility for therapeutic science.