ASA Connect

 View Only
  • 1.  Sample Size estimation (Discrete case)

    Posted 05-10-2015 17:41

     

    I have a simple algorithm that determines whether a person is Diabetic or Not Diabetic { 1 or 0 }, not a 0.24 or 0.57 or 0.85 or anything like that just {1= Yes,  0 = No}.

    I am planning on validating this algorithm and this requires choosing an appropriate sample size, I am wondering if anybody knew the formula for sample size section in this context. Assuming power = 0.9 

    ------------------------------
    Sudhi Upadhyaya

    ------------------------------



  • 2.  RE: Sample Size estimation (Discrete case)

    Posted 05-11-2015 08:23

    If your goal is validation, I would base the sample size on precision, not power. So, with a given sample size, how precisely can you estimate sensitivity, specificity, etc. Of you also need to then consider proportion of cases.

    Hope that's helpful.

    ------------------------------
    Douglas Landsittel
    Professor of Medicine, Biostatistics and Clinical and Translational Science
    University of Pittsburgh-School of Medicine
    ------------------------------




  • 3.  RE: Sample Size estimation (Discrete case)

    Posted 05-11-2015 09:28

    Hi Sudhi,

    The maximum variance for a proportion, p_hat, occurs when p_hat=.5.

    In SAS, Proc Power can be used for binary proportions.  

    Hope this helps to get you started.

    ------------------------------
    Brandy Sinco
    Research Associate
    ------------------------------




  • 4.  RE: Sample Size estimation (Discrete case)

    Posted 05-11-2015 12:16

    There are many ways to validate. I'm guessing here, but I suspect that you want to compare your algorithm, which is simple, cheap, or fast, to a gold standard measure of diabetes. The gold standard is something that has been around for a while and is well trusted by doctors, but it may be a lot more expensive or time consuming that what you are proposing.

    Establishing validity in this framework is typically done by establishing that your sensitivity and specificity are large enough. You want to select a sample size so that the confidence intervals for sensitivity and specificity are reasonably narrow. A key statistic here is the proportion of patients in your sample that will have diabetes according to your gold standard.

    Psychologists use terms like "criterion validity" or "predictive validity" in this case, though I am always a bit unclear on their terminology. That's probably more of a limitation on my intellectual capacity than a criticism of their definitions.

    Note that there is no "power" involved in this calculation. The reason for this is that validity is not something that is easily reduced to a simple hypothesis test.

    If you want more details, I talk about sample sizes needed for a study of a diagnostic test at http://www.pmean.com/04/SampleSizeDiagnostic.html

    Establishing good values for sensitivity and specificity are not the only way to validate your algorithm, of course, and if you had a different method to establish validity, share it with us and we'll help you figure out how to justify your sample size.

    ------------------------------
    Stephen Simon
    Independent Statistical Consultant
    P. Mean Consulting
    ------------------------------