ASA Connect

 View Only
  • 1.  Should we impute missing data while presenting descriptive stat?

    Posted 07-26-2016 01:38

    Most of the proposed methods for missing data imputation are guided to regression analysis. Should (or Can) we impute missing data while the objective is merely to present some descriptive statistics (mean, SD, mode) in the preliminary tables of a manuscript? If yes, which method is appropriate? I am familiar with mean imputation, stochastic imputation, and multiple imputation. Given that the missing data met the MCAR or MAR criteria. 

    Thank you in advance,

    Mamun

    ------------------------------
    Md Abdullah Mamun
    PhD Student
    UNTHSC
    ------------------------------


  • 2.  RE: Should we impute missing data while presenting descriptive stat?

    Posted 07-28-2016 09:11

    Mamun - 

    Perhaps it's just semantics, but it seems to me that if you give descriptive statistics for a data set, and there are missing data, then this would actually require inference.

    Regression is one important area.  


    Also, even if you are just looking at missing completely at random, it would seem that an example where you need to impute would be if you are showing totals, not just means, by group. 

    There are many appropriate methods.

    Perhaps I misunderstood your question? 

    Cheers - Jim

    ------------------------------
    James Knaub
    Lead Mathematical Statistician
    Retired



  • 3.  RE: Should we impute missing data while presenting descriptive stat?

    Posted 07-29-2016 05:09

    A lot depends on the context.  When I worked for <proprietary> R&D I sometimes wrote formal reports saying things like, "This suggests.......However, <here I described the caveats>.  To resolve these issues we could <describe future experiments>."  Then they started worrying about paper trails and lawsuits.  My present job is in a quality department of a regulated industry where I would be worried about putting something like that in an email.

    ------------------------------
    Emil M Friedman, PhD
    emilfriedman@gmail.com
    http://www.statisticalconsulting.org



  • 4.  RE: Should we impute missing data while presenting descriptive stat?

    Posted 07-29-2016 08:55

    I would prefer to see the ';actual' data in that demographics or descriptive table, including how many items are missing [say, n=500, but for some things its n=475].  It shows just what's missing and extent of it before you apply any assumptions on missing data and how to impute it.

    Context is important too (on appropriateness of imputation and methods) as others have written. If you later did some imputing so that you could retain things in the model (for example), then you can describe imputation and why you chose certain methods. 

    For example -- When I run recursive partitioning (in R), there are default assumptions for filling in missing data.  I usually use those and try others (mode, etc) to see how sensitive results are to if and how I've imputed.

    Good luck!

    Rachael

    ------------------------------
    Rachael DiSantostefano



  • 5.  RE: Should we impute missing data while presenting descriptive stat?

    Posted 08-01-2016 08:09

    I agree with the comment that presenting at least one descriptive table that portrays the actual data and including a count of the number of missing values for each variable. It helps the reader understand the extent and nature of missing-ness and why imputation was applied. 

    ------------------------------
    Eugenie Coakley, MA, MPH, PState
    Senior Consultant/Statistician
    John Snow, Inc.



  • 6.  RE: Should we impute missing data while presenting descriptive stat?

    Posted 08-01-2016 09:03

    I agree with Rachel and Eugene.  A table that describes the data as is, prior to imputation, along with listing the missing frequencies is important.  Down the road you do not want anyone to wonder if you were covering up a serious missingness problem. Depending upon the imputation method, demonstrating conditions like missing at random may be important.

    ------------------------------
    Nora Galambos
    Senior Data Scientist
    Stony Brook University



  • 7.  RE: Should we impute missing data while presenting descriptive stat?

    Posted 08-02-2016 14:04
    Yes, if it is feasible.  As far as I can determine, in this situation, you can only impute variables on X = independent variables easily.  I tried to impute variables on Y/X using the Regression Imputation Equation, but it is time-consuming and inexact.





  • 8.  RE: Should we impute missing data while presenting descriptive stat?

    Posted 08-03-2016 08:11

    Imputation, and statistical results in general, by nature are never "exact".  Imputation is really just another form of (or an addition to) modelling.  It has advantages (increased sample size) and disadvantages (e.g. increased variability) when used in conjunction with a statistical model.

    I see little purpose to imputing data for the purpose of descriptive statistics.  If one of my clients asked me to do this, I'd ask them why?  Odds are they would be hoping to somehow make the descriptive statistic more useful (quite possibly wanting to draw inference from it) - which would lead to an education moment on just how a descriptive statistic is (or more likely is not) useful.  

    ------------------------------
    Joseph Nolan
    Associate Professor of Statistics
    Director, Burkardt Consulting Center
    Northern Kentucky University
    Department of Mathematics & Statistics



  • 9.  RE: Should we impute missing data while presenting descriptive stat?

    Posted 08-03-2016 12:05

    I do appreciate all the comments.  I see the pros/cons of imputing data for presenting descriptive stats in a paper, although I believe that the more closely the original data from the full sample represent the population  allows us to more fully give the reader a picture of the data--is it skewed, what are the outliers-- how much dispersion in the data, and so forth, so that will inform us better to do inferential statistics.

    ------------------------------
    Julie Tackett



  • 10.  RE: Should we impute missing data while presenting descriptive stat?

    Posted 08-04-2016 14:49

    In general I would not. I think the most important reason is that fhe existence and extent of missing data represent important descriptive facts about the phenomena being observed and/or your observation methods, and hence are relevant to any effective description. If a large portion (or only a small portion) of the data are missing, this would be important to know and relevant to the data's interpretability.

    I agree that, because imputation methods require assumptions, they go beyond simply describing the data observed. In addition, they may result in underestimating, perhaps substantially, variances and CIs, and hence in overestimating precision.

    There me are situations where it is especially important to avoid imputation in ones descriptions. In situations involving heavy tails, For example, outliers such as long-term survivors, uncommon toxicity events, very high-income  individuals, stock-market crashes, etc., are often very important to an effective description of the phenomenon being evaluated. Imputation methods tend to result in under-reporting or underestimating the impact of such phenomena.

    ------------------------------
    Jonathan Siegel
    Associate Director Clinical Statistics