Discussion: View Thread

distributions for modeling percent (or proportion) of rather narrow plausible range--specifically, hemoglobin A1c in diabetics

  • 1.  distributions for modeling percent (or proportion) of rather narrow plausible range--specifically, hemoglobin A1c in diabetics

    Posted 08-28-2024 16:18

    I'm helping a colleague and his team of residents and medical students with a cross-sectional chart review study of hemoglobin A1c levels in patients with type 2 diabetes, and how (if at all) they relate to a variety of "social determinants of health": transportation problems, substance use, various measures of wealth/poverty, and so on.

    For those unfamiliar with it, hemoglobin A1c is a measure of what percent of one's hemoglobin molecules are glycosolated---have glucose molecules hooked to them. It's a permanent bond, persisting for the life of that hemoglobin molecule---about 3 months. So it's a measure of long-term diabetes control: lower hemoglobin A1c = better diabetes control, and vice versa.

    By rights, hemoglobin A1c is a percentage: can't be less than zero or greater than 100. In clinical reality, values less than about 4 or greater than about 16 are biologically implausible; in 25 years of practice I don't think I've never seen one outside that range. 

    I've always been intrigued by the beta distribution and keep my eye out for opportunities to use it. It has a theoretical attraction for proportions/percentages. Any experience with, or thoughts about, modeling a percent outcome variable with the beta distribution, when the plausible range is narrow like this?

    The log-normal also occurs to me. Using the fitdistrplus package in R, the observed data seem a tad closer to the lognormal theoretical CDF than to the beta or the normal. (I've attached two quick and dirty graphs. These are unconditional plots, or course, with no predictors, so might not be definitive.)

    I'd welcome any thoughts. Thanks.



    ------------------------------
    Christopher Ryan
    Agency Statistical Consulting, LLC
    ------------------------------

    Attachment(s)

    pdf
    EmpiricalDensity.pdf   40 KB 1 version
    pdf
    ThreeCDFs.pdf   39 KB 1 version


  • 2.  RE: distributions for modeling percent (or proportion) of rather narrow plausible range--specifically, hemoglobin A1c in diabetics

    Posted 08-28-2024 17:17

    Hi Christopher,

    I wonder if you need the Beta here. It can be difficult to work with in modeling, e.g., because observed values of exactly 0 and 1 aren't allowed (though not an issue in your case unless you plan to rescale the data to match the Beta's range). If lognormal works, I'd probably opt for that. 

    Cheers,
    Vince 



    ------------------------------
    Vincent Staggs, PhD
    Senior Scientist
    IDDI
    ------------------------------



  • 3.  RE: distributions for modeling percent (or proportion) of rather narrow plausible range--specifically, hemoglobin A1c in diabetics

    Posted 08-28-2024 17:20

    I don't know the answer. However, NHANES does collect HBA1C and seems to have some suggestions for the statistical analysis 

    In advance I realize you're not using NHANES

    link from CDC here

    https://wwwn.cdc.gov/Nchs/Nhanes/2015-2016/GHB_I.htm

    and the analytic guidelines

    https://wwwn.cdc.gov/nchs/nhanes/analyticguidelines.aspx 

    and

    https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10148706/



    ------------------------------
    Chris Barker, Ph.D.
    Past Chair
    Statistical Consulting Section
    Consultant and
    Adjunct Associate Professor of Biostatistics
    www.barkerstats.com


    ---
    "In composition you have all the time you want to decide what to say in 15 seconds, in improvisation you have 15 seconds."
    -Steve Lacy
    ------------------------------



  • 4.  RE: distributions for modeling percent (or proportion) of rather narrow plausible range--specifically, hemoglobin A1c in diabetics

    Posted 08-28-2024 17:28
    If this is a medical research project and not a statistics research project, I would recommend against transforming A1c. It is pretty much always presented with means and standard deviations in its natural units, and clinically meaningful changes are defined in terms of its natural units.

    This is tangential, but A1c is useful in large part because it approximates a weighted average of average blood glucose over time, something we've only recently started to be able to measure with CGMs. Given that glucose is the biological measure that I think we should be interested in, the fact that A1c is a percentage feels like a red herring to me.

    If this is a statistics research project and it won't be subject to any medical peer review, transform as you wish. It could be an interesting mathematical experiment even if it's not a great medical experiment.

    Best,

    Dave





  • 5.  RE: distributions for modeling percent (or proportion) of rather narrow plausible range--specifically, hemoglobin A1c in diabetics

    Posted 08-28-2024 19:16
    Christopher,
     
    As an engineer, I am not familiar with any prior distributional modeling of A1c data or if there are biologically preferred variability distributions for parametric interpretation.  However, the Reliability field of engineering has a very large library of potentially applicable distributions that may empirically fit a given data set if there is no preference for mathematical form or other modeling considerations.  In.particular, the LogLogistic distribution¹ appears to have the same cumulative distribution function shape and empirical probability density function shape as you have presented in your attachments.  Fitting this distribution may require a mathematical modeling approach if there is not an existing R package which supports this distribution.
     
    1. Statistical Methods for Reliability Data, Meeker, W. Q., and Escobar,.L. A., Chapter 4.11 LOGLOGISTIC DISTRIBUTION, p. 89, 1998, John.Wiley & Sons.
     
    Tom
     
    Thomas D. Sandry, PhD
    Industrial Statistical Consultant, Retired



    ------------------------------
    Thomas Sandry
    ------------------------------



  • 6.  RE: distributions for modeling percent (or proportion) of rather narrow plausible range--specifically, hemoglobin A1c in diabetics

    Posted 08-28-2024 22:13

    We could find a distribution to fit.  If nothing else, R's logspline package has a very effective semiparametric approach.  But why do you want to fit the distribution?  What is your research question?

    It sounds like relating A1c to other covariates is the main interest.  Besides nonparametric methods, which can be difficult to interpret, you could use quantile regression to fit, say, 10%, 50%, and 90% quantiles as a function of covariates.  I find this appealing because sometimes in biology we don't have nice mean shifts and constant variance as a function of covariates, but rather a subset of cases that shifts while its complement does not, leading to the distribution changing with covariates.  A related possibility is the gamlss family of R packages which allow location, scale, and shape (skewness and kurtosis) to vary smoothly or discretely with predictor variables.  I'm still learning about this.  It offers a large set of distributions parameterized by location, scale, and one or both of skewness and kurtosis.  Distributions are organized by their ranges (unrestricted, >0, and [0, 1]).  But some care in the choice of distribution is still required.  Also, some distribution in that set could probably be applied to the marginal distribution.



    ------------------------------
    Jim Garrett
    ------------------------------



  • 7.  RE: distributions for modeling percent (or proportion) of rather narrow plausible range--specifically, hemoglobin A1c in diabetics

    Posted 08-29-2024 08:52

    Hi Christopher,

    The recent (2024) 2nd edition of Stroup, Ptukhina and Garai: "Generalized Linear Mixed Models: Modern Concepts, Methods and Applications" presents state-of-the-art discussions on important issues associated with modeling of rates and proportions using a GLMM approach, including implementation and practical issues to contend with... several of these issues not obvious until recently, and thus commonly overlooked. E.g. non-trivial implications of using transformations and normal approximations. Specifically, Chapter 12 gets on the specifics of modeling rates and proportions using both binomial and beta distributions, depending on the case. There is much we are still learning about modeling of non-Gaussian responses, specifically in the context of studies characterized by data architecture.

    Good luck! 

     



    ------------------------------
    Nora M. Bello
    Professor
    The Ohio State University
    ------------------------------



  • 8.  RE: distributions for modeling percent (or proportion) of rather narrow plausible range--specifically, hemoglobin A1c in diabetics

    Posted 08-29-2024 10:12

    This is great Nora, thanks! I will definitely dig into that.



    ------------------------------
    Christopher Ryan
    Agency Statistical Consulting, LLC
    ------------------------------



  • 9.  RE: distributions for modeling percent (or proportion) of rather narrow plausible range--specifically, hemoglobin A1c in diabetics

    Posted 08-29-2024 10:31

    Thanks for the insights everyone!  A lot of good ideas to consider.  Perhaps a less restrictive approach--additive model, gamlss, etc---is the way to go.

    For good or for ill (probably the latter) most of the social-determinant-of-health predictors are dichotomous, presence vs absence.



    ------------------------------
    Christopher Ryan
    Agency Statistical Consulting, LLC
    ------------------------------



  • 10.  RE: distributions for modeling percent (or proportion) of rather narrow plausible range--specifically, hemoglobin A1c in diabetics

    Posted 09-24-2024 19:23

    Quantile regression turned out to be an excellent approach to this problem. Thanks to those who suggested it. Also was a good learning opportunity for me. I am often faced with non-Gaussian, non-symmeterical response variables, and I can see quantile regression as a good method to have at hand.

    The most challenging part was explaining it to my investigator colleagues. They're smart physicians but have very little statistical background. They grasp the concept "Does X have any effect on Y?" In fact, most physicians are very familiar with that concept and that language. But as I talked with the team about this analysis, it occurred to me that something is missing from their language. A more complete version, familiar to us, would be "Does a change in X have any effect on the expected value of Y?"   Or a slight modification, " . . . on the mean of Y?"  

    That imprecision in language and thought causes some troubles, I think. The idea that a predictor could affect some other aspect of a distribution, other than its mean, was tough for them to grasp.

    Using childhood growth charts, that many physicians use every day in patient care, seemed to be a useful analogy. My colleague lamented, "Quantile? I don't know what a quantile is!" I replied, "Oh, yes you do! You just call them percentiles."  https://www.cdc.gov/growthcharts/cdc-charts.htm



    ------------------------------
    Christopher Ryan
    Agency Statistical Consulting, LLC
    ------------------------------