Research Project Title:
Knowledge-Enhanced Protein Language Models for Therapeutics
abstract:The vast majority of common diseases are caused by complex combinations of genetic and environmental factors, interacting in genomic pathways that make it difficult to effectively treat the disease. A better understanding of these pathways would open the door for improved identification of therapeutics, potentially transforming countless lives.
In this work, we aim to improve understanding of disease-causing pathways through the application of self-supervised deep learning techniques which have led to incredible progress in many areas of AI in recent years. By enabling learning from massive unlabelled datasets, these techniques provide a mechanism for generating semantic, compact representations (embeddings) of input protein sequences which are massively enabling for a wide range of downstream tasks.
We build on existing work by integrating additional biological information into learned representations, with a focus on utility for therapeutic science.