Nicole,
You may also consider using a robust regression method such as Least Median Squares (LMS) [1]. This will allow you to analyze the data without transformations and would probably give you some reasonable estimates given the nature of your outcome variable.
The other questions you pose regarding transformations and when to treat a variable as a continous one are ones that I have dealt with myself. As far as transformations go, I use them sparingly and only in a few circumstances. The reason for this is because the scale on which the transformation is based on often has no practical interpretation. For example, I have used natural log transformations when dealing with biomarkers because you can then back-translate the mean of the transformed variable onto its original scale of measurement [2-4]. Otherwise, my preference is to use non-parametric tests on the raw data.
Regarding whether the outcome should be treated as a continuous variable or otherwise, depends on what the variable is intended to measure and if it is intended be treated as a continuous variable. For example, in autopsy studies of Alzheimer's disease cases a very common measurement of neuronal tangles is the Braak stage which goes from 0 to 6 (ordinal scale). Ostensibly you could treat this as a continuous variable, but a mean of 3.37 is not particularly valuable since this is not a possible value for an individual case to have. Treating this as a categorical variable is much more meaningful as the various stages represent distinct differences in the degree of pathology that is present.
I hope all of this is helpful.
Mike
1. Rousseeuw, PJ. Least median of squares regression. Journal of American Statistical Association 1984;79:871–880.
2. Bland JM, Altman DG. Transformations, means, and confidence intervals. British Medical Journal 1996;312:1079.
3. Malek-Ahmadi M, Patel A, Sabbagh MN. KIF6 719Arg carrier status association with homocysteine and c-reactive protein in amnestic mild cognitive impairment and Alzheimer’s disease patients. International Journal of Alzheimer’s Disease 2013;2013:242303. doi:10.1155/2013/242303.
4. Ravaglia G, Forti P, Maioli F, et al, Apiloproetein E e4 allele affects risk of hyperhomocysteinemia in the elderly. American Journal of Clinical Nutrition 2006;84:1473-1480.
------------------------------
Mike Malek-Ahmadi
Banner Alzheimer's Institute
Original Message:
Sent: 05-11-2016 10:12
From: Nicole Mack
Subject: To treat as categorical or continuous
Hello All,
I was conducting an analysis in which the main point was to determine if an interaction was significant. My outcome was a a count of how many correct answers a participant received from a set of 7 questions (range from 0 to 7 although everyone had at least 1 correct answer). All predictors, excepting one were categorical. It was first suggested that I use a linear regression model but upon reviewing the data it didn't appear that the assumptions were met despite a sample size of about 500. I then went on to use Poisson regression as I have been taught that such a model is good for count data but the model fit statistics suggested that the model wasn't a good fit (Value/DF ~ 0.21), that it was in fact underdispersed.
My question is,
1. What other steps should I have taken with the data in order to get a better model (if I am to keep all of the predictors in the model), especially if my main point is to determine if the interaction is significant?
2. Should the dependent variable just have been transformed and then utilized in a linear regression model or does that lose interpretation?
3. It seems that treating something as continuous when it is not is the default but is that always the case?
4. What are some good sources of underdispersion,also what does that mean in a general sense?
Thanks for any insight you all are able to provide.
------------------------------
Nicole Mack
------------------------------