Math-Bio seminar: "Fast, scalable prediction of deleterious noncoding variants from genomic data"

Mon, 01/23/2017 - 16:00 - 17:00
Adam Siepel, Cold Spring Harbor Laboratory

Across many species, a large fraction of genetic variants that influence phenotypes of interest is located outside of protein-coding genes, yet existing methods for identifying such variants have poor predictive power. Here, we introduce a new computational method, called LINSIGHT, that substantially improves the prediction of noncoding nucleotide sites at which mutations are likely to have deleterious fitness consequences, and which therefore are likely to be phenotypically important. LINSIGHT combines a simple neural network for functional genomic data with a probabilistic model of molecular evolution. The method is fast and highly scalable, enabling it to exploit the “Big Data” available in modern genomics. It can be fitted to data by maximum likelihood using an online stochastic gradient ascent algorithm, with gradients computed efficiently by back-propagation. We show that LINSIGHT outperforms the best available methods in identifying human noncoding variants associated with inherited diseases. In addition, we apply LINSIGHT to an atlas of human enhancers and show that the fitness consequences at enhancers depend on cell type, tissue specificity, and constraints at associated promoters.

318 Carolyn Lynch Laboratory