Tingting Zhan, PhD

March 7, 2024 Webinar

Estimation and Model Selection for Finite Mixtures of Tukey’s g-&-h Distribution

Tingting Zhan, PhD

 

Abstract: 

Finite mixture of densities is a popular statistical model, which is especially meaningful when the population of interest may include distinct subpopulations. This work is motivated by analysis of protein expression levels quantified using immunofluorescence immunohistochemistry assays of human tissues. The distributions of cellular protein expression levels in a tissue often exhibit multimodality, skewness and heavy tails, but there is a substantial variability between distributions in different tissues from different subjects, while some of these mixture distributions include components consistent with assumption of a nor- mal distribution. To accommodate such diversity, we propose a mixture of 4-parameter Tukey’s g-&-h distributions for fitting finite mixtures with Gaussian and non-Gaussian components. Tukey’s g-&-h distribution is a flexible model that allows variable degree of skewness and kurtosis in mixture components, including normal distribution as a particular case. Since the likelihood of the Tukey’s g-&-h mixtures, does not have a closed analytical form, we propose a Quantile Least Mahalanobis Distance (QLMD) estimator for parameters of such mixtures. QLMD is an indirect estimator minimizing the Mahalanobis distance between the sample and model-based quantiles, and its asymptotic prop- erties follow from the general theory of indirect estimation. We have developed a stepwise algorithm to select a parsimonious Tukey’s g-&-h mixture model and implemented all proposed methods in the R package QuantileGH available CRAN. A simulation study was conducted to evaluate performance of the Tukey’s g-&-h mixtures and compare to performance of mixtures of skew-normal or skew-t distributions. The Tukey’s g-&-h mixtures were applied to model cellular expressions of Cyclin D1 protein in breast cancer tissues, and resulting parameter estimates evaluated as predictors of progression-free survival.

Short Bio:

Dr. Zhan is Assistant Professor at Thomas Jefferson University’s Division of Biostatistics and Bioinformatics, Department of Pharmacology, Physiology and Cancer Biology.  She works in design and statistical analysis of clinical, translational, and basic science studies. She made contributions to robust estimation of non-Gaussian distributions in mixed effects models for multilevel clustered data. Her current research interests focus on developing statistical software for quantification and spatial analysis of cancer biomarkers and tumor immune microenvironment using immunofluorescence-based immunohistochemistry.