HONGZHE LI, PH.D. Professor of Biostatistics and Statistics University of Pennsylvania, Perelman School of Medicine
THURSDAY, MARCH 28, 2013 10:00 a.m.– 11:00 a.m. Clinical Research Building, Room 692
With the development of next generation sequencing technologies, researchers have now been able to study the microbiome composition using direct sequencing, whose outputs are short sequence reads for each of the microbiome samples. We first introduce a model-based method to quantify the bacterial compositions. We then discuss the issues related to associating the microbiome compositions with environmental covariates or clinical outcomes, including (1) identification of the biological/environmental factors that are associated with bacterial compositions; (2) identification of the bacterial taxa that are associated with clinical outcomes. Statistical models to address these problems need to account for the high-dimensional and sparse and compositional nature of the data. In addition, the prior phylogenetic tree among the bacterial species provides useful information on bacterial phylogeny. In this talk, I will present several statistical methods we developed for analyzing the bacterial compositional data, including kernel-based regression, sparse Dirichlet-multinomial regression, compositional data regression and construction of bacterial taxa network based on compositional data. I demonstrate the methods using a data set that links human gut microbiome to diet intake in order to identify the micro-nutrients that are associated with the human gut microbiome and the bacteria that are associated with body mass index.