ASA Connect

 View Only
  • 1.  Sample size question

    Posted 02-28-2018 16:09
    This seems like a silly question but I need some aid to wrap my head around the problem--

    Trying to test for the performance of a screening tool to screen a "condition" in the patient population. The condition itself occurs in about 20% of the population. We have a confirmatory procedure to later confirm if the screening tool was effective.

    So, how to obtain the sample size in this case to have a given sensitivity and specificity?

    Thanks in advance.

    Enayet


  • 2.  RE: Sample size question

    Posted 03-01-2018 08:41
    Hi Enayet,

    If your goal is to test an odds ratio, one option is to use SAS Proc Power with the logistic option.
    http://documentation.sas.com/?docsetId=statug&docsetTarget=statug_power_examples15.htm&docsetVersion=14.3&locale=en

    Here's another example of Proc Power from the Michigan SAS Users Group conference in 2012:

    proc power;
    logistic
    vardist("x1a") = normal(0, 1)
    testpredictor = "x1a"
    responseprob = 0.5
    units=("x1a"=SD)
    testoddsratio = 1.5
    power = .80
    ntotal = .
    run;


    ------------------------------
    Brandy Sinco, BS, MA, MS
    Statistician and Programmer/Analyst
    ------------------------------



  • 3.  RE: Sample size question

    Posted 03-01-2018 12:19
    Perhaps a reasonable approach is to think of it as a regression for a binary outcome:
    The sample size required would depend on the number of candidate predictors (regardless of whether some are dropped later or not) that you are considering to create the screening algorithm, and the number of events in the outcome variable (~20% prevalence). The "limiting sample size" is the smallest count between events and non events, so in this case it would be the positives for the condition (~20 out of 100).
    For typical low-signal to noise situations, the fitted model is likely to be reliable (in the sense that the it would have predictive discrimination on an independent validation sample similar to the apparent discrimination in the "training" sample) when the ratio of the limiting sample size to the number of coefficients is not low, such as at least 10 events per coefficient. A good average requirement is having at least 15 events per coefficient (see Harrell 2015 textbook, pp72-73 for more detail. I think there are some online handouts as well).
    So let's say you have are considering a set of predictors that translate into 10 potential coefficients, then the number of events needed to fit the model should be somewhere between 10*10=100 and 15*10=150, and therefore assuming ~20% prevalence the target sample size should be somewhere between 500 and 750.

    ------------------------------
    Andres Azuero
    UAB
    ------------------------------



  • 4.  RE: Sample size question

    Posted 03-01-2018 12:58
    If I correctly understand the problem posed, when you say "performance" and are examining sensitivity and specificity, the sample size sought can be derived based upon the accuracy of the assessment device, the population proportion and, of course, the nature of the sample assessed.  Assuming the sample is both random and representative of the population (which I suspect it may not be), N probably needs to be calculated based on some weighting factor derived from the accuracy of the instrument and how it increases the CI around the proportion in the population parameter.  Sorry to say, but I cannot think of an  article offhand that would clarify matters for you.  Hope this helps anyway.

    ------------------------------
    Gene Fisch, Ph.D.
    Baruch College
    ------------------------------



  • 5.  RE: Sample size question

    Posted 03-01-2018 21:16
    the sensitivity and specificity of your test do not depend on the sample size, but your estimates of their standard errors/confidence intervals do.
    i.e., if a test has 50% sensitivity, it will have that sensitivity regardless of how big your sample is.
    i want to note that in many cases, studies of this sort are done on extremes of the disease spectrum, often in some quota-sampled situation.  the sensitivity and specificity measured in this type of study will NOT be the sens and spec in the whole population, where there are people with middling performance on your test.
    from your post, it may be that you have a continuous measure and are looking to find the optimal cutoff.  sensitivity and specificity ALWAYS trade off.  the optimal levels (or even whether the test is worth doing) will depend on the costs (cost of test, cost of not treating false negatives, cost of whatever you do to false positives), including nonmonetary costs. 

    --
    Ellen Hertzmark

    (617) 432-1200 (1635 tremont st--usually wednesdays 1:30-5)

    in march 2018:  7, 14, 21, 27 or28 (tbd):  approx. 11:30-3 
     
    please call before coming to my office.

    (617) 734-6245 (home -  any time except sundown friday to 1 hour after sundown saturday))





  • 6.  RE: Sample size question

    Posted 03-02-2018 00:47

    In my mind, the phrase "obtain the sample size to have a given sensitivity and specificity" means trying to use sample size to tune the performance of the screening tool until it attains desired values of sensitivity and specificity. Unfortunately, sensitivity and specificity are functions of the screening tool, not the sample size.

     

    However, the screening tool's observed values of sensitivity and specificity are point estimates from independent binomial distributions, and each point estimate comes with a binomial standard error that quantifies the precision of the estimate. So if we changed the task slightly, to "obtain the sample size needed for the sensitivity and specificity to have a given level of precision", then that is an easy problem to solve if one has a reasonable guess on what the screening tool's sensitivity and specificity should be.

     

    The trick is, one needs to calculate two sample sizes and add them together. One sample size will be the number who have the condition being screened for, to control the size of the standard error of the sensitivity estimate. The other sample size will be the number who do not have the condition being screened for, to control the size of the standard error of the specificity estimate.   


    Confidentiality Notice: This e-mail message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message.