I am working on a project where the goal is to use naive Bayes to classify the most probable diagnoses given a set of symptoms. We have a fairly large number of classifiers, as well as symptoms, and I know we will need a substantial amount of data but have not found any good references on this topic.
I understand that there are factors to take into account such as signal-to-noise, complexity of models, etc., but was hoping that somebody might have a good rule of thumb that gives some general idea as to how much training data is enough, give a certain number of classifiers or predictors. Any advice would be greatly appreciated, as well as any references.
Thank you!
-------------------------------------------
Miranda Kroehl
Colorado School of Public Health
-------------------------------------------