I'm helping a colleague and his team of residents and medical students with a cross-sectional chart review study of hemoglobin A1c levels in patients with type 2 diabetes, and how (if at all) they relate to a variety of "social determinants of health": transportation problems, substance use, various measures of wealth/poverty, and so on.
For those unfamiliar with it, hemoglobin A1c is a measure of what percent of one's hemoglobin molecules are glycosolated---have glucose molecules hooked to them. It's a permanent bond, persisting for the life of that hemoglobin molecule---about 3 months. So it's a measure of long-term diabetes control: lower hemoglobin A1c = better diabetes control, and vice versa.
By rights, hemoglobin A1c is a percentage: can't be less than zero or greater than 100. In clinical reality, values less than about 4 or greater than about 16 are biologically implausible; in 25 years of practice I don't think I've never seen one outside that range.
I've always been intrigued by the beta distribution and keep my eye out for opportunities to use it. It has a theoretical attraction for proportions/percentages. Any experience with, or thoughts about, modeling a percent outcome variable with the beta distribution, when the plausible range is narrow like this?
The log-normal also occurs to me. Using the fitdistrplus package in R, the observed data seem a tad closer to the lognormal theoretical CDF than to the beta or the normal. (I've attached two quick and dirty graphs. These are unconditional plots, or course, with no predictors, so might not be definitive.)
I'd welcome any thoughts. Thanks.
------------------------------
Christopher Ryan
Agency Statistical Consulting, LLC
------------------------------