I believe the consensus is not to categorize (dichotomize) a continuous scale. Let me give a list of reasons not to dichotomize.
- Power - you need to enroll more patients into your trial.
- We throw away interval level information (hence means) and ordinal level information (hence medians).
- The statistical approaches often assume large Ns.
- The statistical approaches limits the type of analyses.
Power
At BEST (if you dichotomize at the median) you throw away information. At BEST, you would need to increase the study's N by 60%. If the dichotomy was at 90:10, you would need to
increase N by a factor of four.
Interval Level info
As was pointed out before, what is the difference between a score of 20 and 24? Zero. Between a score of 1 and 19? Zero. Between 19 and 20? Well they are maximally different. If you dichotomize and use the dichotomy for analysis, how do you summarize the data. Well, if the data are dichotomous, you have no right to present means, sd, medians, min or max. You can only summarize the proportions. Again you are throwing all that information out.
Large N
The analogue for the analysis of variance test is the logistic regression test. One of its key assumptions is something called 'asymptotic normality'. What that means is that it assumes that the Ns need to be quite large. Logistic regression routinely uses hundreds of observations.
Limit of type of analysis
With a dichotomy, one can easily compare your scale and some outcome. However, having two i.v. is a problem. Take time. Assuming you measured your control scale at three time points. How do you analyze it. If continuous you would do a control, time, and control by time model with some d.v. Can you do a full two-way model (and allow for correlated errors in time)? Perhaps, only by very cumbersome models. Trivial for continuous data.
Conclusion: I agree that dichotomizing data into success and failure makes interpretation much easier. However, to plan a trial for a dichotomy would necessitate at a minimum a 60% increase in patients. A small study would, at best, need to be doubled in size. If the split into the two groups is not the ideal 50/50, then the increase would need to be much larger. A statistical analysis of a dichotomy also requires a large N. It also makes factorial designs almost impossible to analyze or interpret.
Recommendation: If simplicity of interpretation is desired, then analyze the data as a continuum, but present (descriptive [no p-values or CI]) summary tables with the dichotomy.
-------------------------------------------
Allen Fleishman
Allen Fleishman Biostatistics Inc.
-------------------------------------------