Quantile regression turned out to be an excellent approach to this problem. Thanks to those who suggested it. Also was a good learning opportunity for me. I am often faced with non-Gaussian, non-symmeterical response variables, and I can see quantile regression as a good method to have at hand.
The most challenging part was explaining it to my investigator colleagues. They're smart physicians but have very little statistical background. They grasp the concept "Does X have any effect on Y?" In fact, most physicians are very familiar with that concept and that language. But as I talked with the team about this analysis, it occurred to me that something is missing from their language. A more complete version, familiar to us, would be "Does a change in X have any effect on the expected value of Y?" Or a slight modification, " . . . on the mean of Y?"
That imprecision in language and thought causes some troubles, I think. The idea that a predictor could affect some other aspect of a distribution, other than its mean, was tough for them to grasp.
Using childhood growth charts, that many physicians use every day in patient care, seemed to be a useful analogy. My colleague lamented, "Quantile? I don't know what a quantile is!" I replied, "Oh, yes you do! You just call them percentiles." https://www.cdc.gov/growthcharts/cdc-charts.htm
------------------------------
Christopher Ryan
Agency Statistical Consulting, LLC
------------------------------
Original Message:
Sent: 08-28-2024 16:18
From: Christopher Ryan
Subject: distributions for modeling percent (or proportion) of rather narrow plausible range--specifically, hemoglobin A1c in diabetics
I'm helping a colleague and his team of residents and medical students with a cross-sectional chart review study of hemoglobin A1c levels in patients with type 2 diabetes, and how (if at all) they relate to a variety of "social determinants of health": transportation problems, substance use, various measures of wealth/poverty, and so on.
For those unfamiliar with it, hemoglobin A1c is a measure of what percent of one's hemoglobin molecules are glycosolated---have glucose molecules hooked to them. It's a permanent bond, persisting for the life of that hemoglobin molecule---about 3 months. So it's a measure of long-term diabetes control: lower hemoglobin A1c = better diabetes control, and vice versa.
By rights, hemoglobin A1c is a percentage: can't be less than zero or greater than 100. In clinical reality, values less than about 4 or greater than about 16 are biologically implausible; in 25 years of practice I don't think I've never seen one outside that range.
I've always been intrigued by the beta distribution and keep my eye out for opportunities to use it. It has a theoretical attraction for proportions/percentages. Any experience with, or thoughts about, modeling a percent outcome variable with the beta distribution, when the plausible range is narrow like this?
The log-normal also occurs to me. Using the fitdistrplus package in R, the observed data seem a tad closer to the lognormal theoretical CDF than to the beta or the normal. (I've attached two quick and dirty graphs. These are unconditional plots, or course, with no predictors, so might not be definitive.)
I'd welcome any thoughts. Thanks.
------------------------------
Christopher Ryan
Agency Statistical Consulting, LLC
------------------------------