Discussion: View Thread

  • 1.  Data sleuthing: Invalid pairs of sample size and percentage

    Posted 09-25-2013 14:48
    This message has been cross posted to the following eGroups: Statistical Programmers and Analysts Section and Statistical Consulting Section .
    -------------------------------------------
    I'd appreciate ideas about how to check whether a reported sample size (N) and a reported percentage of that sample size (P) are valid, assuming that P was obtained by rounding to the nearest R the quotient of a subsample's size (S) divided by N; let's assume N and S are integers and 0 < S < N.  I've figured out a brute-force strategy, but for future work I'd like to do this more efficiently or elegantly.  Below I give some context, show an example, and make a few remarks.

    CONTEXT: I'm conducting meta-analyses for a client's systematic review of interventions to improve medication adherence.  About half of the 500+ primary studies reported one or more N-P pairs, often from more than one sample of participants on more than one measure or occasion.  Typically N is the reported number of Treatment participants and P is the reported percentage of those participants classified as "adherent" against some threshold.  By "reported" I mean my client obtained N and P from reports written by third parties whom we often can't readily contact for more data or information (e.g., article, meeting abstract).  Checking these nearly 1,500 N-P pairs is one of many ways I identify potential errors in the original studies or our abstracted data.

    EXAMPLE: Consider a sample with N = 12, P = 73, and R = 1.  I'd flag that N-P pair as a possible error to check.  Here's my brute-force strategy for deciding that: From NP / 100 = 8.76 we infer that S would have to be either 8 or 9, but S = 8 would be reported as P = 66 or P = 67 (i.e., 66.6... rounded down or up), and S = 9 would most likely be reported as P = 75 but maybe also as P = 74 or P = 76 (e.g., 75.00000001 carelessly rounded up).  Because 73 isn't in that set of possible values for P, {66, 67, 74, 75, 76}, probably N or P is an error of reporting, data extraction or entry, etc.  For instance, maybe N = 11 or N = 120.

    REMARKS: First, the above strategy is easy to program, but I suspect it can be simplified or otherwise improved, such as by exploiting facts about rational numbers or rounding; for instance, if N > 100(1/R) then every multiple of 100R is a valid P.  Second, I realize some valid N-P pairs might seem invalid, such as if P were based on an average of two or more proportions (or logits, etc.) or a model-implied probability estimate.  Third, we usually infer R from how P and its counterparts from other samples or endpoints are reported, but this may not be accurate; for instance, if one sample's "70" is reported near another sample's "68.2," is "70" rounded to the nearest 10, 1, 0.1, or something else (e.g., 5, 2)?


    Cheers,

    Adam

    -------------------------------------------
    Adam Hafdahl
    Owner & Principal Consultant
    ARCH Statistical Consulting, LLC
    -------------------------------------------


  • 2.  RE:Data sleuthing: Invalid pairs of sample size and percentage

    Posted 09-25-2013 20:49
    Make sure the data aren't weighted. If there is any weighting (e.g. for demographics) I wouldn't try to figure out the valid references.

    -------------------------------------------
    Michael Kruger
    Information Resources Inc
    -------------------------------------------