This is almost certainly a simple issue of over-fitting due to model selection that the final inference ignores. The conditions are ripe: your "positive" population is small, and more importantly model selection occurred. Possibly a good deal of it. I note your sentence: "The purpose of the study is to model the odds of the outcome using a variety of factors." So they went fishing. And with a large data set with many factors, they don't have to fish much to get horribly biased results.
See Ambroise & McLachlan, 2002, "Selection Bias in Gene Extraction on the Basis of Microarray Gene-Expression Data." Proceedings of the National Academy of Sciences, available for free download from pnas.org. The problem is different but the principles are the same.
The final inference needs to encompass model selection. As a rule of thumb for article reviews, I would suggest that any article reporting a statistical result in which modeling considered five or more variables should specifically describe how model-selection is addressed in inference, and make a convincing case that their methods prevent "selection bias", else they don't get published. I'm so glad you wrote! You're doing a wonderful job as reviewer!
There are multiple ways to make inference that includes model selection. One possibility is to avoid model selection altogether: use all variables, and (possibly) apply regularization methods (e.g., the Firth adjustment mentioned earlier) to make the problem computationally tractable. Another possibility is to apply resampling methods (bootstrapping or cross-validation) to a fitting process that includes variable selection. This is complicated because one doesn't get the same set of variables for every iteration...but that's exactly the point. Still another possibility is to use Bayesian Model Averaging, such as with R's BMA package; there are now additional packages, and I couldn't comment on which is best. Anyway, it's the authors who need to figure out the solution.
When I was in graduate school, in the days before data mining got big, we would discuss selecting a model and then interpreting the model. There was always an acknowledgement that p-values and confidence intervals were biased due to model selection, but we sort of waved at the issue as we passed by, and comforted ourselves that with a modest degree of prior analysis, the effect would not be severe. Fast forward many years, and people are routinely analyzing massive datasets with tens, hundreds, or thousands of variables, and selection bias can be enormous. I wouldn't be entirely surprised if the true OR your authors find for their factor, once they control properly for model selection, is 1.0, i.e., completely null. I can tell you I learned of this issue the hard way, by having a problem where I thought the data supported almost complete separation when in fact there was nothing reproducible there.
The classification and machine learning world has been badly burned by this (just as I was), and the community is now largely sensitized to the issue (I hope!). The biostatistics community is not as broadly sensitized, presumably because it's less common there to see a large number of predictor variables. But I think this example is exactly one of those occasions.
If anyone wants to discuss details, I would suggest spawning a new thread, because that discussion could utterly swamp this one.
Good luck!
------------------------------
Jim Garrett, PhD
Sr. Assoc. Dir. of Biostatistics
Novartis