Probabilistic Clustering vs EM Algorithm for Gaussian Mixture Model

By Antonio Moretti posted 08-22-2013 00:03

  
Hi all,

I'm working on a project where I'm examining the distribution of test scores which has a heavy tail. I want to identify predictor variables that may explain the skewness of this distribution and to first account for the intravariability or heterogeneity. I'm using an Expectation-Maximization algorithm (calling the normalmixEM function) to estimate the parameters of component normal distributions that appear to be a good fit. One approach I've read about is to perform a univariate cluster analysis on the test scores to identify cluster means and to compare the cluster statistics with the estimated parameters from the mixture model. The advantage to the EM algorithm is that it assigns probabilities to belonging to distributions rather than simply assigning membership. Is there an R function or any examples of code for probabilistic clustering rather than using the kmeans function, or examples of code that calculates a probability of an observation belonging to each of the distributions, given the value of the variable? Thanks!

Best,
Antonio
1 comment
141 views

Permalink

Comments

07-02-2015 12:30

Since you have a skewed distribution, I suggest trying Quantile Regression model in addition to Mixture model.