ANOVA and discriminant are not approriate in the usual use of these methods. However, some of the output from these procedures can be useful an selecting the number of clusters to retain and in forming a substantive interpretation of the clusters.
I have been doing clustering since 1971. I learned early on not to rely solely on a single proximity measure and agglomeration method combination.
Non-hierarchical methods like k-means are subject to variations in clusters found due to case ordering. This is also true for TWOSTEP although not to the same degree.
Determining the number of clusters to retain is somewhat an art. (no pun intended.)
In the mid-70's while I presented an approach I call "core clusters". Unfortunately, I no longer have copies of those papers.
1) run a half-dozen or so different proximity measure and agglomeration methods.
WARD, complete linkage, latent class, TWOSTEP. euclidean distance, Q correlations on ipsatized data, etc. depending on what software you have available.
2) get a ballpark on the range of cluster numbers to retain
Unless you have very large numbers of cases, you can do some hierarchical methods. To ball park the number of clusters to interpret some methods give a measure of quality of fit and one eyeballs for a jump in the series of fit measures.
If you use the TWOSTEP procedure in SPSS and specify "/print ic" you will get a table of AIC or BIC (whichever you specify) for 1 to the max number of clusters you specify.
3) rerun the clusterings saving the cluster assignments assignments for each value in the ballpark range.
4) use crosstabs to find groups of cases (core clusters)that are assigned together by several methods. Create a new variable which shows which core cluster each case is in.
5) run a discriminant function analysis. IGNORE all tests of significance. In the classification phase look at the probabilities of assignment. see which cases are not cleanly assigned to one group or are very far from the centroid. Recode assigned membership into a new core membership variable giving those cases a value that treats them as ungrouped.
do #5 using the assignments of the previous run until you don't' get many changes.
6) Use the means from the DFA to create profile charts for the clusters. (a profile chart is sort of a parallel coordinate plot but all variables are on a common metric, commonly z-scores.) Hint: a monitor screen can work as a sort of light table when you superimpose one plot over the other to eyeball the profiles.
7) Use the profile charts, the F's from the univariate part of the DFA output, and the info on which variables distinguish which groups to form a SUBJECTIVE interpretation of the final clustering. Some cases may not fit into the solution.
8) use the final core cluster variable to see how well it associates with or correlates with variables that were not among those used in the clustering.
HTH
P.s. Do you have a way to double check whether cases that have a zero within-case standard deviation are really answering your question?
-------------------------------------------
Arthur Kendall
Social Research Consultants
-------------------------------------------