Discussion: View Thread

Back to discussions

Expand all | Collapse all

Data sleuthing: Invalid pairs of sample size and percentage

Adam Hafdahl09-25-2013 14:48

This message has been cross posted to the following eGroups: Statistical Programmers and Analysts Section ...

Michael Kruger09-25-2013 20:49

Make sure the data aren't weighted. If there is any weighting (e.g. for demographics) I wouldn't try ...

1. Data sleuthing: Invalid pairs of sample size and percentage

Recommend
Adam Hafdahl
Posted 09-25-2013 14:48
This message has been cross posted to the following eGroups: Statistical Programmers and Analysts Section and Statistical Consulting Section .
-------------------------------------------
I'd appreciate ideas about how to check whether a reported sample size (N) and a reported percentage of that sample size (P) are valid, assuming that P was obtained by rounding to the nearest R the quotient of a subsample's size (S) divided by N; let's assume N and S are integers and 0 < S < N. I've figured out a brute-force strategy, but for future work I'd like to do this more efficiently or elegantly. Below I give some context, show an example, and make a few remarks.

CONTEXT: I'm conducting meta-analyses for a client's systematic review of interventions to improve medication adherence. About half of the 500+ primary studies reported one or more N-P pairs, often from more than one sample of participants on more than one measure or occasion. Typically N is the reported number of Treatment participants and P is the reported percentage of those participants classified as "adherent" against some threshold. By "reported" I mean my client obtained N and P from reports written by third parties whom we often can't readily contact for more data or information (e.g., article, meeting abstract). Checking these nearly 1,500 N-P pairs is one of many ways I identify potential errors in the original studies or our abstracted data.

EXAMPLE: Consider a sample with N = 12, P = 73, and R = 1. I'd flag that N-P pair as a possible error to check. Here's my brute-force strategy for deciding that: From NP / 100 = 8.76 we infer that S would have to be either 8 or 9, but S = 8 would be reported as P = 66 or P = 67 (i.e., 66.6... rounded down or up), and S = 9 would most likely be reported as P = 75 but maybe also as P = 74 or P = 76 (e.g., 75.00000001 carelessly rounded up). Because 73 isn't in that set of possible values for P, {66, 67, 74, 75, 76}, probably N or P is an error of reporting, data extraction or entry, etc. For instance, maybe N = 11 or N = 120.

REMARKS: First, the above strategy is easy to program, but I suspect it can be simplified or otherwise improved, such as by exploiting facts about rational numbers or rounding; for instance, if N > 100(1/R) then every multiple of 100R is a valid P. Second, I realize some valid N-P pairs might seem invalid, such as if P were based on an average of two or more proportions (or logits, etc.) or a model-implied probability estimate. Third, we usually infer R from how P and its counterparts from other samples or endpoints are reported, but this may not be accurate; for instance, if one sample's "70" is reported near another sample's "68.2," is "70" rounded to the nearest 10, 1, 0.1, or something else (e.g., 5, 2)?

Cheers,

Adam

-------------------------------------------
Adam Hafdahl
Owner & Principal Consultant
ARCH Statistical Consulting, LLC
-------------------------------------------
2. RE:Data sleuthing: Invalid pairs of sample size and percentage

Recommend
Michael Kruger
Posted 09-25-2013 20:49
Make sure the data aren't weighted. If there is any weighting (e.g. for demographics) I wouldn't try to figure out the valid references.

-------------------------------------------
Michael Kruger
Information Resources Inc
-------------------------------------------

Discussion: View Thread

Data sleuthing: Invalid pairs of sample size and percentage

Adam Hafdahl09-25-2013 14:48

Michael Kruger09-25-2013 20:49

1. Data sleuthing: Invalid pairs of sample size and percentage

2. RE:Data sleuthing: Invalid pairs of sample size and percentage