Hi, Daniel
If, by prediction, you mean classification, then people associated with Edward R. Dougherty at Texas A&M have published on sample size in this context. Below are two that appeared in 2005 in Bioinformatics:
(1)
How many samples are needed to build a classifier: a general sequential approach. Fu WJ,
Dougherty ER, Mallick B, Carroll RJ.
Bioinformatics. 2005 Jan 1;21(1):63-70. Epub 2004 Aug 5. PMID: 15297303
(2)
Optimal number of features as a function of sample size for various classification rules. Hua J, Xiong Z, Lowey J, Suh E,
Dougherty ER.
Bioinformatics. 2005 Apr 15;21(8):1509-15. Epub 2004 Nov 30. PMID: 15572470
Also, search Dougherty's name in PubMed and browse some of the other titles that come up.
However, if, by prediction, you mean the prediction of an individual's risk of coming down with something undesirable at a future time, then that's an area in which (a) people are coming to realize that classification algorithms are inadequate to the task, and (b) they've been trying to develop new methods, and indeed, new metrics of performance, in order to get a better handle on individual-risk prediction. Two good papers to look for in this context are as follows:
(1)
Use and misuse of the receiver operating characteristic curve in risk prediction. Cook NR.
Circulation. 2007 Feb 20;115(7):928-35. PMID: 17309939
(2)
Evaluating the added predictive ability of a new marker: from area under the ROC curve to reclassification and beyond. Pencina MJ, D'Agostino RB Sr, D'Agostino RB Jr, Vasan RS.
Stat Med. 2008 Jan 30;27(2):157-72; discussion 207-12. PMID: 17569110
Additionally, Margaret Pepe has published commentary on both these papers as well as contributed relevant methodology of her own.
Have fun
-------------------------------------------
Eric Siegel
Boistatistician
Univ of Arkansas for Medical Sciences
-------------------------------------------