# Math-Bio seminar: "Covariate-corrected biclustering methods for gene-expression and GWAS data"

A common goal in data-analysis is to sift through a large matrix and
detect any significant submatrices (i.e., biclusters) that have a low
numerical rank. To give an example from genomics, one might imagine a
data-matrix involving several genetic-measurements taken across many
patients. In this context a ‘bicluster’ would correspond to a subset of
genetic-measurements that are correlated across a subset of the
patients. While some biclusters might extend across most (or all) of the
patients, it is also possible for biclusters to involve only a small
subset of patients. Detecting biclusters such as these provides a first
step towards unraveling the physiological mechanisms underlying the
heterogeneity within a patient population.

We present a simple algorithm for
tackling this biclustering problem - i.e., for detecting low-rank
submatrices from within a much larger data-matrix. An important feature
of our method is that it can easily be modified to account for many
considerations which commonly arise in practice. For example, our
algorithm can be used to find biclusters that manifest only within a
‘case’-population without manifesting within a ‘control’-population.
Moreover, our algorithm can correct for categorical- and
continuous-covariates, as well as sparsity within the data. We
illustrate these practical features with two examples; the first drawn
from gene-expression analysis and the second drawn from a much larger
genome-wide-association-study (GWAS).