Michael, Stan, and everyone.
Michael's response highlights the value of the ASA Connect to all of us: The ability to present and discuss varied understandings of nd approaches to statistics. Understandably, some might consider specific information to be "junk" but there are those of us who haven't encountered that information previously. Especially for statisticians like me who are relatively inexperienced and who work isolated from other statisticians, the collegiality and willingness to share information and insight that is demonstrated by the more experienced statisticians who post to ASA Connect is invaluable.
I would like to make clear that I didn't mean to imply that I had dismissed Stan's comments on the dangers of multiple testing or to ignore his obvious breadth and depth of knowledge. I meant to communicate that the authors were not conducting an exploratory analysis and had not conducted multiple testing.
Stan and Michael, thanks again for your responses.
Linda
------------------------------
Linda A. Landon, PhD, ELS
Research Consultant
PhD, Molecular Pharmacology
Graduate Certificate, Applied Statistics
Board-Certified Editor in the Life Sciences
Research Communiqué
Clear, Concise Statistics & Words
LandonPhD@ResearchCommunique.com573-797-4517
Original Message:
Sent: 12-27-2016 01:21
From: Michael Chernick
Subject: Pseudo R squared
I chose to contact Linda privately for fear that others would be irritated about getting email that they consider junk because the topic doesn't interest them. But since I want to make additional comments and so many of you submitted very thoughtful answers to the group, I am doing so now. The point i want to make is that Linda should not dismiss Stan Young's issue about multiple testing since Linda said that the study is not exploratory. That is exactly when P-value adjustment needs to be considered and is too commonly ignored.
Stan is an expert on this. I think he is too modest to mention the wonderful book he coauthored with Peter Westfall for Wiley in 1993 on resampling-based p-value adjustment. The bootstrap and permutation approaches to this problem are very helpful especially when methods like Bonferroni are overly conservative. I have advocated it, used it in my work and referred to it in my books on the bootstrap.
------------------------------
Michael Chernick
Original Message:
Sent: 12-26-2016 21:42
From: Dabao Zhang
Subject: Pseudo R squared
Dear Linda,
I think it really depends on what you want to use R-squared to measure when fitting a generalized linear model. Classical R-squared measures the explained variance in fitting homoscedastic linear models, where it coincides with explained information.
I have recently written a paper on defining a variance-function-based R-squared for generalized linear models (including logistic regression and even quasi models), see http://www.tandfonline.com/doi/full/10.1080/00031305.2016.1256839. My simulation study experience is that, some previously defined measures except the K-L-distance-based R-squared can seriously overstate. So I would recommend the K-L-distance-based R-squared for logistic regression. In general, the variance-function-based R-squared performs similarly to the K-L-distance-based R-squared, but the latter one does not work for quasi models (knowing mean and variance functions only). So, if you want to calculate R-squared for quasi models, you can download my package available at https://cran.r-project.org/web/packages/rsq/index.html.
Hope this helps.
Best,
Dabao
------------------------------
Dabao Zhang
Associate Professor
Purdue University
Original Message:
Sent: 12-23-2016 13:13
From: Linda Landon
Subject: Pseudo R squared
Happy Holidays Everyone.
For a client, I'm reading and commenting on a previously-published, peer-reviewed paper to determine if the methods and results in the paper are applicable to their work. The first and fatal problem with this paper is that there is NO description or even a passing mention of the statistical protocol applied to analyze the data. How does this stuff get published? That's a rhetorical question.
My questions for the group are in regards to the pseudo R squared statistic. It appears that the authors may have used logistic regression to analyze two data sets. (Although, who knows what they did because they never actually said what they did.) The first data set is a semi-fabricated simulation data set; the second data set is empirical data. The authors used Efron's pseudo R squared to compare the simulation results to the empirical results.
First, I previously haven't run into using any R squared statistic (derived from either OLS or logistic regression) to compare the fit of one model to two different data sets. Is that a valid application?
Second, my only previous exposure to pseudo R square statistics was a professor who said to ignore the statistic because the interpretation of pseudo R squared is nebulous. Because I was unfamiliar with the various pseudo R squared statistics, I attempted to educate myself about these. My research was not particularly instructive. Opinions stated in statistical blogs and the peer-reviewed literature appear to fall into two completely opposed categories. The first appears to be the opinion, similar to my professor's, that pseudo R square statistics are meaningless, can't be interpreted, and shouldn't be used. The opposing camp thinks that pseudo R square statistics are valuable and valid.
Would anyone care to comment on this? I'd appreciate it because, for the same client, I'm applying logistic regression to analyze a data set. If I'm missing something in regards to using pseudo R square to interpret logistic regression models and results, I need to remedy that deficiency.
Thanks in advance.
Linda
------------------------------
Linda A. Landon, PhD, ELS
Research Consultant
PhD, Molecular Pharmacology
Graduate Certificate, Applied Statistics
Board-Certified Editor in the Life Sciences
Research Communiqué
Clear, Concise Statistics & Words
LandonPhD@ResearchCommunique.com
573-797-4517
------------------------------