Discussion: View Thread

  • 1.  distributions for modeling percent (or proportion) of rather narrow plausible range--specifically, hemoglobin A1c in diabetics

    Posted 14 days ago

    I'm helping a colleague and his team of residents and medical students with a cross-sectional chart review study of hemoglobin A1c levels in patients with type 2 diabetes, and how (if at all) they relate to a variety of "social determinants of health": transportation problems, substance use, various measures of wealth/poverty, and so on.

    For those unfamiliar with it, hemoglobin A1c is a measure of what percent of one's hemoglobin molecules are glycosolated---have glucose molecules hooked to them. It's a permanent bond, persisting for the life of that hemoglobin molecule---about 3 months. So it's a measure of long-term diabetes control: lower hemoglobin A1c = better diabetes control, and vice versa.

    By rights, hemoglobin A1c is a percentage: can't be less than zero or greater than 100. In clinical reality, values less than about 4 or greater than about 16 are biologically implausible; in 25 years of practice I don't think I've never seen one outside that range. 

    I've always been intrigued by the beta distribution and keep my eye out for opportunities to use it. It has a theoretical attraction for proportions/percentages. Any experience with, or thoughts about, modeling a percent outcome variable with the beta distribution, when the plausible range is narrow like this?

    The log-normal also occurs to me. Using the fitdistrplus package in R, the observed data seem a tad closer to the lognormal theoretical CDF than to the beta or the normal. (I've attached two quick and dirty graphs. These are unconditional plots, or course, with no predictors, so might not be definitive.)

    I'd welcome any thoughts. Thanks.



    ------------------------------
    Christopher Ryan
    Agency Statistical Consulting, LLC
    ------------------------------

    Attachment(s)

    pdf
    EmpiricalDensity.pdf   40 KB 1 version
    pdf
    ThreeCDFs.pdf   39 KB 1 version


  • 2.  RE: distributions for modeling percent (or proportion) of rather narrow plausible range--specifically, hemoglobin A1c in diabetics

    Posted 13 days ago

    Hi Christopher,

    I wonder if you need the Beta here. It can be difficult to work with in modeling, e.g., because observed values of exactly 0 and 1 aren't allowed (though not an issue in your case unless you plan to rescale the data to match the Beta's range). If lognormal works, I'd probably opt for that. 

    Cheers,
    Vince 



    ------------------------------
    Vincent Staggs, PhD
    Senior Scientist
    IDDI
    ------------------------------



  • 3.  RE: distributions for modeling percent (or proportion) of rather narrow plausible range--specifically, hemoglobin A1c in diabetics

    Posted 13 days ago

    I don't know the answer. However, NHANES does collect HBA1C and seems to have some suggestions for the statistical analysis 

    In advance I realize you're not using NHANES

    link from CDC here

    https://wwwn.cdc.gov/Nchs/Nhanes/2015-2016/GHB_I.htm

    and the analytic guidelines

    https://wwwn.cdc.gov/nchs/nhanes/analyticguidelines.aspx 

    and

    https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10148706/



    ------------------------------
    Chris Barker, Ph.D.
    Past Chair
    Statistical Consulting Section
    Consultant and
    Adjunct Associate Professor of Biostatistics
    www.barkerstats.com


    ---
    "In composition you have all the time you want to decide what to say in 15 seconds, in improvisation you have 15 seconds."
    -Steve Lacy
    ------------------------------



  • 4.  RE: distributions for modeling percent (or proportion) of rather narrow plausible range--specifically, hemoglobin A1c in diabetics

    Posted 13 days ago
    If this is a medical research project and not a statistics research project, I would recommend against transforming A1c. It is pretty much always presented with means and standard deviations in its natural units, and clinically meaningful changes are defined in terms of its natural units.

    This is tangential, but A1c is useful in large part because it approximates a weighted average of average blood glucose over time, something we've only recently started to be able to measure with CGMs. Given that glucose is the biological measure that I think we should be interested in, the fact that A1c is a percentage feels like a red herring to me.

    If this is a statistics research project and it won't be subject to any medical peer review, transform as you wish. It could be an interesting mathematical experiment even if it's not a great medical experiment.

    Best,

    Dave





  • 5.  RE: distributions for modeling percent (or proportion) of rather narrow plausible range--specifically, hemoglobin A1c in diabetics

    Posted 13 days ago
    Christopher,
     
    As an engineer, I am not familiar with any prior distributional modeling of A1c data or if there are biologically preferred variability distributions for parametric interpretation.  However, the Reliability field of engineering has a very large library of potentially applicable distributions that may empirically fit a given data set if there is no preference for mathematical form or other modeling considerations.  In.particular, the LogLogistic distribution¹ appears to have the same cumulative distribution function shape and empirical probability density function shape as you have presented in your attachments.  Fitting this distribution may require a mathematical modeling approach if there is not an existing R package which supports this distribution.
     
    1. Statistical Methods for Reliability Data, Meeker, W. Q., and Escobar,.L. A., Chapter 4.11 LOGLOGISTIC DISTRIBUTION, p. 89, 1998, John.Wiley & Sons.
     
    Tom
     
    Thomas D. Sandry, PhD
    Industrial Statistical Consultant, Retired



    ------------------------------
    Thomas Sandry
    ------------------------------



  • 6.  RE: distributions for modeling percent (or proportion) of rather narrow plausible range--specifically, hemoglobin A1c in diabetics

    Posted 13 days ago

    We could find a distribution to fit.  If nothing else, R's logspline package has a very effective semiparametric approach.  But why do you want to fit the distribution?  What is your research question?

    It sounds like relating A1c to other covariates is the main interest.  Besides nonparametric methods, which can be difficult to interpret, you could use quantile regression to fit, say, 10%, 50%, and 90% quantiles as a function of covariates.  I find this appealing because sometimes in biology we don't have nice mean shifts and constant variance as a function of covariates, but rather a subset of cases that shifts while its complement does not, leading to the distribution changing with covariates.  A related possibility is the gamlss family of R packages which allow location, scale, and shape (skewness and kurtosis) to vary smoothly or discretely with predictor variables.  I'm still learning about this.  It offers a large set of distributions parameterized by location, scale, and one or both of skewness and kurtosis.  Distributions are organized by their ranges (unrestricted, >0, and [0, 1]).  But some care in the choice of distribution is still required.  Also, some distribution in that set could probably be applied to the marginal distribution.



    ------------------------------
    Jim Garrett
    ------------------------------



  • 7.  RE: distributions for modeling percent (or proportion) of rather narrow plausible range--specifically, hemoglobin A1c in diabetics

    Posted 13 days ago

    Hi Christopher,

    The recent (2024) 2nd edition of Stroup, Ptukhina and Garai: "Generalized Linear Mixed Models: Modern Concepts, Methods and Applications" presents state-of-the-art discussions on important issues associated with modeling of rates and proportions using a GLMM approach, including implementation and practical issues to contend with... several of these issues not obvious until recently, and thus commonly overlooked. E.g. non-trivial implications of using transformations and normal approximations. Specifically, Chapter 12 gets on the specifics of modeling rates and proportions using both binomial and beta distributions, depending on the case. There is much we are still learning about modeling of non-Gaussian responses, specifically in the context of studies characterized by data architecture.

    Good luck! 

     



    ------------------------------
    Nora M. Bello
    Professor
    The Ohio State University
    ------------------------------



  • 8.  RE: distributions for modeling percent (or proportion) of rather narrow plausible range--specifically, hemoglobin A1c in diabetics

    Posted 13 days ago

    This is great Nora, thanks! I will definitely dig into that.



    ------------------------------
    Christopher Ryan
    Agency Statistical Consulting, LLC
    ------------------------------



  • 9.  RE: distributions for modeling percent (or proportion) of rather narrow plausible range--specifically, hemoglobin A1c in diabetics

    Posted 13 days ago

    Thanks for the insights everyone!  A lot of good ideas to consider.  Perhaps a less restrictive approach--additive model, gamlss, etc---is the way to go.

    For good or for ill (probably the latter) most of the social-determinant-of-health predictors are dichotomous, presence vs absence.



    ------------------------------
    Christopher Ryan
    Agency Statistical Consulting, LLC
    ------------------------------