Perhaps a reasonable approach is to think of it as a regression for a binary outcome:
The sample size required would depend on the number of candidate predictors (regardless of whether some are dropped later or not) that you are considering to create the screening algorithm, and the number of events in the outcome variable (~20% prevalence). The "limiting sample size" is the smallest count between events and non events, so in this case it would be the positives for the condition (~20 out of 100).
For typical low-signal to noise situations, the fitted model is likely to be reliable (in the sense that the it would have predictive discrimination on an independent validation sample similar to the apparent discrimination in the "training" sample) when the ratio of the limiting sample size to the number of coefficients is not low, such as at least 10 events per coefficient. A good average requirement is having at least 15 events per coefficient (see Harrell 2015 textbook, pp72-73 for more detail. I think there are some online handouts as well).
So let's say you have are considering a set of predictors that translate into 10 potential coefficients, then the number of events needed to fit the model should be somewhere between 10*10=100 and 15*10=150, and therefore assuming ~20% prevalence the target sample size should be somewhere between 500 and 750.
------------------------------
Andres Azuero
UAB
------------------------------
Original Message:
Sent: 02-28-2018 16:07
From: Enayetur Raheem
Subject: Sample size question
This seems like a silly question but I need some aid to wrap my head around the problem--
Trying to test for the performance of a screening tool to screen a "condition" in the patient population. The condition itself occurs in about 20% of the population. We have a confirmatory procedure to later confirm if the screening tool was effective.
So, how to obtain the sample size in this case to have a given sensitivity and specificity?
Thanks in advance.
Enayet