Discussion: View Thread

Back to discussions

Expand all | Collapse all

sample size calculation for prediction

Daniel Scharfstein08-19-2011 15:04

Dear All, I am new to this e-group. I have been following the sample size discussion re ITT and attrition. ...

Eric Siegel08-23-2011 17:08

Hi, Daniel If, by prediction, you mean classification, then people associated with Edward R. Dougherty ...

1. sample size calculation for prediction

Recommend
Daniel Scharfstein
Posted 08-19-2011 15:04
Dear All,

I am new to this e-group. I have been following the sample size discussion re ITT and attrition. Very interesting. The recent NAS report on missing data in clinical trials notes that this is an important area for research.

I was curious to know thoughts on sample size calculations when the aim to is building a prediction model/algorithm.
Seems to be a dearth of literature on this topic, but I may be wrong.

All the best,

Dan

-------------------------------------------
Daniel Scharfstein
Professor of Biostatistics
Director, Graduate Program
Johns Hopkins School of Public Health
-------------------------------------------
2. RE:sample size calculation for prediction

Recommend
Eric Siegel
Posted 08-23-2011 17:08
Hi, Daniel

If, by prediction, you mean classification, then people associated with Edward R. Dougherty at Texas A&M have published on sample size in this context. Below are two that appeared in 2005 in Bioinformatics:

(1) How many samples are needed to build a classifier: a general sequential approach. Fu WJ, Dougherty ER, Mallick B, Carroll RJ. Bioinformatics. 2005 Jan 1;21(1):63-70. Epub 2004 Aug 5. PMID: 15297303

(2) Optimal number of features as a function of sample size for various classification rules. Hua J, Xiong Z, Lowey J, Suh E, Dougherty ER. Bioinformatics. 2005 Apr 15;21(8):1509-15. Epub 2004 Nov 30. PMID: 15572470

Also, search Dougherty's name in PubMed and browse some of the other titles that come up.

However, if, by prediction, you mean the prediction of an individual's risk of coming down with something undesirable at a future time, then that's an area in which (a) people are coming to realize that classification algorithms are inadequate to the task, and (b) they've been trying to develop new methods, and indeed, new metrics of performance, in order to get a better handle on individual-risk prediction. Two good papers to look for in this context are as follows:

(1) Use and misuse of the receiver operating characteristic curve in risk prediction. Cook NR. Circulation. 2007 Feb 20;115(7):928-35. PMID: 17309939

(2) Evaluating the added predictive ability of a new marker: from area under the ROC curve to reclassification and beyond. Pencina MJ, D'Agostino RB Sr, D'Agostino RB Jr, Vasan RS. Stat Med. 2008 Jan 30;27(2):157-72; discussion 207-12. PMID: 17569110

Additionally, Margaret Pepe has published commentary on both these papers as well as contributed relevant methodology of her own.

Have fun

-------------------------------------------
Eric Siegel
Boistatistician
Univ of Arkansas for Medical Sciences
-------------------------------------------

Discussion: View Thread

sample size calculation for prediction

Daniel Scharfstein08-19-2011 15:04

Eric Siegel08-23-2011 17:08

1. sample size calculation for prediction

2. RE:sample size calculation for prediction