Research Areas

Inferring meta-pathway activity from genomic data

Coordinated Gene Activity in Pattern Sets (CoGAPS) is a novel pattern identification algorithm implemented in an R/Bioconductor package by the same name. This algorithm identifies sets of genes, called meta-pathways, with concurrent changes in high-throughput data. CoGAPS also provides a continuous measure of the extent to which each meta-pathway is active in specific samples. This meta-pathway activity can distinguish cancer subtypes, biomarkers, and dynamics of biological processes. 

Accounting for inter-tumor heterogeneity in genomics

Genomic and epigenetic landscapes of tumors have higher variability than normal samples, increasing with in tumors worsening prognosis. Statistical analyses that compare the variability of the genomic measurements in tumor samples relative to normal for genes can prioritize molecular alterations in individual tumors. Expression Variation Analysis (EVA) quantifies differential variability analysis of pathways and splice variants

Integrated genomics analysis for cancer

High-throughput genomics data enables unprecedented inference of the molecular drivers and biomarkers that distinguish cancer subtypes and therapeutic response. Analysis across diverse omics platforms is essential to find genetic and epigenetic drivers of cancer phenotypes. Research has focused on head and neck squamous cell carcinoma genomics, with additional pan cancer analyses of high-throughput data spanning primary tumors, model organisms, and cell lines.

Algorithms to standardize high throughput data

Robust data preprocessing techniques are essential to providing high throughput data of sufficient quality for subsequent analyses. Often, these algorithms remove inter-sample variability making pattern detection impossible. The permuted surrogate variable analysis (pSVA) batch correction algorithm and the FunNorm preprocessing algorithm remove technical artifacts from high throughput data while preserving signal for pattern detection.  

Mathematical models of cellular signaling networks

In addition to molecular data, mathematical models can predict the state of signaling pathways based upon their network structure. Stochastic models are essential to capture the partially penetrant phenotypes during fate decisions in development. Network models of coupled oscillators and switches also demonstrate that dynamics must be inferred from their context in the broader network instead of from their structure in isolated motifs.

Numerical weather prediction

In weather forecasting, data assimilation schemes that regularly integrate mathematical models and measurements of a dynamical system improves predictions of future states. Algorithms to incorporating indirect satellite observations into a flow-dependent data assimilation scheme, the local ensemble transform Kalman filter (LETKF) further improve forecasting accuracy.