Undergraduate Research and Innovation Scholar
Predicting DNA Functional Elements using Genome Topological Features and Deep Learning
David K. Gifford
The prediction of transcription factor binding sequences in DNA is an important problem in computational biology with implications for many biological phenomena. Current approaches have used ChIP-seq data to learn a representation of a given transcription factors binding affinity. One such method is the k-mer motif and alignment clustering (KMAC) algorithm to produce k-mer set motif (KSM) representations. However, the use of ChIP-seq data is relatively expensive because a new dataset has to be derived for each transcription factor. In order to improve this, we propose to adapt the KMAC/KSM approach to use DNase-seq data, which can be treated as multiplexed ChIP-seq data, in order to produce binding site representations for multiple transcription factors from a single DNase-seq experiment.
I am participating in SuperUROP to get more hands-on experience in a research setting. I am a course 6 student but have also done a lot of work related to biology, including many terms doing wet lab research. This computational biology project looks like an exciting way to marry these two interests.