Discussion: View Thread

Is it appropriate to use F-test / ANOVA to compare means within cluster?

  • 1.  Is it appropriate to use F-test / ANOVA to compare means within cluster?

    Posted 01-08-2012 18:16

    My group often conducts consumer studies to collect liking ratings for our products.  The liking ratings are subjective and tend to be polarizing.  Suppose the liking ratings from 300 consumers are collected for 12 products, and Cluster Analysis is used to segment the consumers into more homogeneous clusters than the total sample (consumers in the same cluster have similar liking patterns toward the 12 products) . 

    Is it appropriate to use F-test / ANOVA to compare mean product liking ratings within each cluster?  If not, what test should be used?   Thanks for your help.

    -------------------------------------------
    Chun-Yen Cochrane
    The Coca-Cola Company
    -------------------------------------------


  • 2.  RE:Is it appropriate to use F-test / ANOVA to compare means within cluster?

    Posted 01-09-2012 03:37
    Dear Chun-Yen,

    Significance testing using any method on the variables used to create the segments would not be meaningful - we know beforehand that the differences among the clusters on these variables cannot be due to sampling error.  I no longer use SAS but the Fastclus procedure used to report "Pseudo F" ratios and specifically covered this issue in the documentation.

    Hope this is helpful.

    Regards,
    Kevin

    -------------------------------------------
    Kevin Gray
    Cannon Gray LLC
    -------------------------------------------








  • 3.  RE:Is it appropriate to use F-test / ANOVA to compare means within cluster?

    Posted 01-09-2012 06:15
    Why would you want to use a test on the variables that you used to do the clustering?

    It might be interesting to do tests on OTHER variables (e.g. - are there more men in cluster 1 ? What is the average age in each cluster?) but doing a test on the liking scores is just a way of saying your clustering worked. But there are better methods of doing this (see books and papers on cluster analysis).


    -------------------------------------------
    Peter Flom
    -------------------------------------------








  • 4.  RE:Is it appropriate to use F-test / ANOVA to compare means within cluster?

    Posted 01-09-2012 09:18
    It does not sound legitimate at all.  Imagine that you collect 300 hundred "result" from a single population having mean zero and variance one.  Now divide them into two "clusters" by putting the negative results in one cluster and the positive results in the other.  If you do an ANOVA you will get a highly significant difference between the two "clusters" but we already know that we actually have a single population with no "real" clustering.

    -------------------------------------------
    Emil M Friedman, PhD
    emil.friedman@alum.mit.edu (forwards to day job)
    emilfrie@alumni.princeton.edu (home)
    http://www.statisticalconsulting.org
    -------------------------------------------








  • 5.  RE:Is it appropriate to use F-test / ANOVA to compare means within cluster?

    Posted 01-09-2012 09:23
    While I agree with all who have said it doesn't make sense to compare the means between clusters, the question seemed to be to compare means _within_ clusters, at least as I read it.  I'm not sure the goal of the analysis is in this case.  Can the goal of the analysis be elaborated more?

    -------------------------------------------
    Robert Podolsky
    Georgia Health Sciences University
    -------------------------------------------








  • 6.  RE:Is it appropriate to use F-test / ANOVA to compare means within cluster?

    Posted 01-09-2012 09:44

    Thank you very much for all the replies. 

    The cluster analysis was based on the liking 1 (for Product 1)-liking 12 (for Product 12) scores from each consumer.  A product can be most liked by one person but become least liked by another.  The consumer clusters are usually driven by maximally differentiated products.  We are interested in finding where the differences (reflected on liking scores) lie in each cluster.   For Example, below are the mean liking scores for the 12 products for  Cluster 1.  Are the mean scores significantly different than one another?  Your help is greatly appreciated.

    7.34

    6.61

    6.38

    6.35

    6.05

    6.04

    5.78

    5.71

    5.49

    5.12

    4.38

    3.29

    -------------------------------------------
    Chun-Yen Cochrane
    The Coca-Cola Company
    -------------------------------------------








  • 7.  RE:Is it appropriate to use F-test / ANOVA to compare means within cluster?

    Posted 01-09-2012 10:59
    Do you want to find groups of "subjects" such that they show large differences in their "liking" scores across the 12 products?  Or, do you want to find how the 12 products differ, with the knowledge that the differences among the products will depend on some unknown "type" of subject who is rating the products?

    On another note, it seems like there might be a problem in what you are doing beyond what you ask.  If I understand correctly, any given subject will rank the 12 products.  If item 5 is ranked as a 1 by subject A, then the remaining 11 products cannot have a rank of 1.  Do others see this as problematic?

    -------------------------------------------
    Robert Podolsky
    Georgia Health Sciences University
    -------------------------------------------








  • 8.  RE:Is it appropriate to use F-test / ANOVA to compare means within cluster?

    Posted 01-09-2012 12:00

    As a practical matter, market researchers use the F-test for this all the time.   In doing so, they are doing nothing more ambitious than describing what happened in the cluster algorithm:  "Oh look, Cluster 2 has people who show about equal preference for BrandA and BrandB" and "In Cluster 3 we have respondents who like BrandC better than BrandD".  Clustering is an exploratory technique and the F-test is used (off-label and extremely approximately) to understand what clusters were formed.  

    Importantly, you would not want to stand or fall on the exact F values or p-values or significance (see previous posters' comments) but rather to measure the (artificial) differences you yourself have put in place creating the clusters.  As one of the earlier posters also mentioned, most clustering programs' output will provide more appropriate tests of this kind.  



    -------------------------------------------
    Bridget Bly
    -------------------------------------------








  • 9.  RE:Is it appropriate to use F-test / ANOVA to compare means within cluster?

    Posted 01-09-2012 12:42

    I agree that clustering is an exploratory tool, but I would like to know that whatever I am doing leads me to reasonable inferences about true groups and not inferences about groups that are just random occurrences.  It seems to me that if you want to know about how preference differs across the products with the knowledge that there are latent groups that differ with regards to their preference, then one should use an approach that tries to find the differences based on the existence of latent groups (e.g., latent group modeling).

    -------------------------------------------
    Robert Podolsky
    Georgia Health Sciences University
    -------------------------------------------








  • 10.  RE:Is it appropriate to use F-test / ANOVA to compare means within cluster?

    Posted 01-09-2012 13:05

    I totally agree, RP, and most market researchers have moved to latent class segmentation techniques for exactly those reasons (and others).  But most also explore differences of many variables which may not have been the basis of the latent classes, but have some (unquantified/unknown) relationship to them.  

    -------------------------------------------
    Bridget Bly
    -------------------------------------------








  • 11.  RE:Is it appropriate to use F-test / ANOVA to compare means within cluster?

    Posted 01-09-2012 13:37
    ANOVA is generally done to do inferences about means and is not exploratory.  My impression was that Chun-Yen wanted to do inference.  I don't know what is done in market research but doing the ANOVA F-test on specific clusters after forming the clusters based on similarity of responses would not be appropriate.  I mentioned that to him privately and most others have said the same.

    -------------------------------------------
    Michael Chernick
    Director of Biostatistical Services
    Lankenau Institute for Medical Research
    -------------------------------------------








  • 12.  RE:Is it appropriate to use F-test / ANOVA to compare means within cluster?

    Posted 01-09-2012 13:18

    If you have other information such as age group, gender, regions, then you probably want to study
    the clusters in conjunction with the other variable groups.

    If you are interested in studying the relationship between the likeness among the twelve drink products,
    Correspondence Analysis, Factor Analysis, MDS may be good tools.

    -------------------------------------------
    Charlie An
    Oracle Corp
    -------------------------------------------





  • 13.  RE:Is it appropriate to use F-test / ANOVA to compare means within cluster?

    Posted 01-09-2012 20:16

    Comparison of the means of clusters using Anova would violate the basic assumptions of Anovas such as random samples, normal distribution, additivity, homoscedasticity, etc.  Cluster analysis (including factor analysis, principal components analysis, etc.) is an exploratory,h ypothesis generating technique.  Observation of the results is expected to lead to the design of inferential, hypothesis testing methods such as Anovas. 
    -------------------------------------------
    Edith Zang
    Independent Consultant
    NYCASA
    -------------------------------------------








  • 14.  RE:Is it appropriate to use F-test / ANOVA to compare means within cluster?

    Posted 01-09-2012 13:26


    -------------------------------------------
    Sheela Talwalker
    Sr. Statistician
    T'Walker Consulting
    -------------------------------------------

    I have no idea how you have performed the cluster analysis to begin with. But it is definitely incorrect to do simple  ANOVA  and use F-test to compare the means of 12 products within each cluster. Here you are not randomly assigning each individual to opine on a product. These observations are not independent. They are correlated. In fact each individual is furnishing opinions on each of the 12 products you want to compare. So that you will have 720 observations at hand when there are 60 individuals in a given cluster. You can calculate 12x12 variance covariance matrix for the cluster and then use it to compare the correlated means.

    I imagine your scale of observation must be a likert scale. Some what like this- do not at all like, neutral, like it, like it very much and so on. It may be a 4 point or a 7 point scale. In which case, you will be better off carrying out the "Friedman's Nonparametric Two-Way ANOVA test". You may refer to chapter 12  in the book on Non-parametric Statistical Inference by Gibbons and Chakraborti.

    Sheela Talwalker, Ph.D.

    T'Walker Consulting










  • 15.  RE:Is it appropriate to use F-test / ANOVA to compare means within cluster?

    Posted 01-09-2012 15:05
    ANOVA and discriminant are not approriate in the usual use of these methods.  However, some of the output from these procedures can be useful an selecting the number of clusters to retain and in forming a substantive interpretation of the clusters.

    I have been doing clustering since 1971.  I learned early on not to rely solely on a single proximity measure and agglomeration method combination.

    Non-hierarchical methods like k-means are subject to variations in clusters found due to case ordering.  This is also true for TWOSTEP although not to the same degree.

    Determining the number of clusters to retain is somewhat an art. (no pun intended.)

    In the mid-70's while I presented an approach I call "core clusters".  Unfortunately, I no longer have copies of those papers. 

    1) run a half-dozen or so different proximity measure and agglomeration methods.
    WARD, complete linkage, latent class, TWOSTEP. euclidean distance, Q correlations on ipsatized data, etc. depending on what software you have available.

    2) get a ballpark on the range of cluster numbers to retain
    Unless you have very large numbers of cases, you can do some hierarchical methods.  To ball park the number of clusters to interpret some methods give a measure of quality of fit and one eyeballs for a jump in the series of fit measures.

    If you use the TWOSTEP procedure in SPSS and specify "/print ic" you will get a table of AIC or BIC (whichever you specify) for 1 to the max number of clusters you specify.

    3) rerun the clusterings saving the cluster assignments assignments for each value in the ballpark range.

    4) use crosstabs to find groups of cases (core clusters)that are assigned together by several methods.  Create a new variable which shows which core cluster each case is in.

    5) run a discriminant function analysis.  IGNORE all tests of significance.  In the classification phase look at the probabilities of assignment. see which cases are not cleanly assigned to one group or are very far from the centroid. Recode assigned membership into a new core membership variable giving those cases a value that treats them as ungrouped.

    do #5  using the assignments of the previous run until you don't' get many changes.

    6) Use the means from the DFA to create profile charts for the clusters.  (a profile chart is sort of a parallel coordinate plot but all variables are on a common metric, commonly z-scores.) Hint: a monitor screen can work as a sort of light table when you superimpose one plot over the other to eyeball the profiles.

    7) Use the profile charts, the F's from the univariate part of the DFA output, and the info on which variables distinguish which groups to form a SUBJECTIVE interpretation of the final clustering.  Some cases may not fit into the solution.


    8) use the final core cluster variable to see how well it associates with or correlates with variables that were not among those used in the clustering.

    HTH

    P.s.  Do you have a way to double check whether cases that have a zero within-case standard deviation are really answering your question?

    -------------------------------------------
    Arthur Kendall
    Social Research Consultants
    -------------------------------------------








  • 16.  RE:Is it appropriate to use F-test / ANOVA to compare means within cluster?

    Posted 01-10-2012 22:38

    What prompted this question was that we worked with two market research firms, both staffed with Ph.D. level statisticians.  One firm advocated the use of ANOVA/LSD to separate the product liking means within each consumer cluster, while the other warned us against it.  It is obvious which side this e-group is on.

    Thank you all very much for helping me resolve this conflict of opinion.

    -------------------------------------------
    Chun-Yen Cochrane
    The Coca-Cola Company
    -------------------------------------------