Math-Bio seminar: "Covariate-corrected biclustering methods for gene-expression and GWAS data"

Mon, 01/30/2017 - 16:00 - 17:00
Adi Rangan, New York University

A common goal in data-analysis is to sift through a large matrix and detect any significant submatrices (i.e., biclusters) that have a low numerical rank. To give an example from genomics, one might imagine a data-matrix involving several genetic-measurements taken across many patients. In this context a ‘bicluster’ would correspond to a subset of genetic-measurements that are correlated across a subset of the patients. While some biclusters might extend across most (or all) of the patients, it is also possible for biclusters to involve only a small subset of patients. Detecting biclusters such as these provides a first step towards unraveling the physiological mechanisms underlying the heterogeneity within a patient population.

We present a simple algorithm for tackling this biclustering problem - i.e., for detecting low-rank submatrices from within a much larger data-matrix. An important feature of our method is that it can easily be modified to account for many considerations which commonly arise in practice. For example, our algorithm can be used to find biclusters that manifest only within a ‘case’-population without manifesting within a ‘control’-population. Moreover, our algorithm can correct for categorical- and continuous-covariates, as well as sparsity within the data. We illustrate these practical features with two examples; the first drawn from gene-expression analysis and the second drawn from a much larger genome-wide-association-study (GWAS).

318 Carolyn Lynch Laboratory