ASA Connect

 View Only
Expand all | Collapse all

Pseudo R squared

  • 1.  Pseudo R squared

    Posted 12-23-2016 13:13

    Happy Holidays Everyone.

    For a client, I'm reading and commenting on a previously-published, peer-reviewed paper to determine if the methods and results in the paper are applicable to their work.  The first and fatal problem with this paper is that there is NO description or even a passing mention of the statistical protocol applied to analyze the data.  How does this stuff get published?  That's a rhetorical question.

    My questions for the group are in regards to the pseudo R squared statistic.  It appears that the authors may have used logistic regression to analyze two data sets. (Although, who knows what they did because they never actually said what they did.)  The first data set is a semi-fabricated simulation data set; the second data set is empirical data.  The authors used Efron's pseudo R squared to compare the simulation results to the empirical results.  

    First, I previously haven't run into using any R squared statistic (derived from either OLS or logistic regression) to compare the fit of one model to two different data sets. Is that a valid application? 

    Second, my only previous exposure to pseudo R square statistics was a professor who said to ignore the statistic because the interpretation of pseudo R squared is nebulous. Because I was unfamiliar with the various pseudo R squared statistics, I attempted to educate myself about these.  My research was not particularly instructive.  Opinions stated in statistical blogs and the peer-reviewed literature appear to fall into two completely opposed categories.  The first appears to be the opinion, similar to my professor's, that pseudo R square statistics are meaningless, can't be interpreted, and shouldn't be used.  The opposing camp thinks that pseudo R square statistics are valuable and valid.  

    Would anyone care to comment on this?  I'd appreciate it because, for the same client, I'm applying logistic regression to analyze a data set.  If I'm missing something in regards to using pseudo R square to interpret logistic regression models and results, I need to remedy that deficiency.

    Thanks in advance.

    Linda

    ------------------------------
    Linda A. Landon, PhD, ELS

    Research Consultant

    PhD, Molecular Pharmacology
    Graduate Certificate, Applied Statistics
    Board-Certified Editor in the Life Sciences

    Research Communiqué
    Clear, Concise Statistics & Words
    LandonPhD@ResearchCommunique.com
    573-797-4517
    ------------------------------


  • 2.  RE: Pseudo R squared

    Posted 12-26-2016 07:42
    Linda:
    You might carefully count all the questions that are at issue in the
    original paper. Sometimes researchers will ask a lot of questions and
    test each question at 0.05 and then make a claim. They are doing what I
    call and exploratory analysis and then writing the paper in a
    confirmatory manner.
    Stan Young




  • 3.  RE: Pseudo R squared

    Posted 12-26-2016 10:11

    Stan, 

    Thanks so much for your reply to my query.  I really appreciate your response.

    The authors don't appear to be engaging in exploratory analysis. Compared to the vague to non-existent description of the statistical analysis protocol, the authors are quite specific about the research question.  My main concern was, assuming that the authors were applying logistic regression analysis, that the authors were appropriately applying and interpreting the pseudo R squared statistic. With further reading and study (and a very helpful email from another list serve member, Michael Chernick), I'm coming to the conclusion that the authors are not applying and interpreting the pseudo R squared statistic appropriately.  It's too bad that the description of this research is flawed because an adequate answer to this research question would have important implications for my client's work. 

    Linda

    ------------------------------
    Linda A. Landon, PhD, ELS

    Research Consultant

    PhD, Molecular Pharmacology
    Graduate Certificate, Applied Statistics
    Board-Certified Editor in the Life Sciences

    Research Communiqué
    Clear, Concise Statistics & Words
    LandonPhD@ResearchCommunique.com
    573-797-4517



  • 4.  RE: Pseudo R squared

    Posted 12-26-2016 12:11
    I found this online:http://www.ats.ucla.edu/stat/mult_pkg/faq/general/Psuedo_RSquareds.htm
     
    Basically, the conclusion is that: "While pseudo R-squareds cannot be interpreted independently or compared across datasets, they are valid and useful in evaluating multiple models predicting the same outcome on the same dataset.  In other words, a pseudo R-squared statistic without context has little meaning.  A pseudo R-squared only has meaning when compared to another pseudo R-squared of the same type, on the same data, predicting the same outcome."

    I hope this helps. If you can send me the article you reviewed I would appreciate it.

    Best wishes for the New Year.

    Michael






  • 5.  RE: Pseudo R squared

    Posted 12-27-2016 08:03

    Michael,

    Thank you for your informative response

    I did find the UCLA website posting early in my research. In an early read through, the information presented added to my confusion about pseudo r square statistics.  However, with the benefit of the knowledgeable and varied responses given by our colleagues and with additional studying and reading, pseudo R square and the information on the UCLA site are beginning to make some sense. 

    I have attached to this posting the paper that I am reviewing and its accompanying supplementary information.  I'd be most interested in receiving anyone's comments about this paper, both on list or in private communications.

    In particular, I'd be interested in opinions as to whether the researchers are using the word "likelihood" in a colloquial sense when they actually mean probability or a summary statistic such as proportion or, if in fact, they are calculating a likelihood, as in a numerical measure of the strength of statistical evidence. 

    In case there are others who are interested in reading about pseudo R square statistics, here are a list of URLs for some information that I've found.  This list includes both peer-reviewed and non-peer-reviewed sources. I can't say that all of these are authoritative but reading different authors' takes on pseudo R square is informative, at least.  The list is not exhaustive and I strongly suspect that it doesn't include important and authoritative sources that I haven't found yet.  Please feel free to add to this list.

    1. http://www3.stat.sinica.edu.tw/statistica/oldpdf/a16n39.pdf (I haven't had the opportunity to read this paper.)
    2. http://stats.stackexchange.com/questions/3559/which-pseudo-r2-measure-is-the-one-to-report-for-logistic-regression-cox-s
    3. http://andrewgelman.com/2009/11/03/can_pseudo-r-sq/
    4. http://www3.stat.sinica.edu.tw/statistica/oldpdf/a16n39.pdf
    5. http://www.ats.ucla.edu/stat/stata/output/stata_logistic.htm (See point f in the annotated Stata results)
    6. http://statisticalhorizons.com/r2logistic
    7. http://teaching.sociology.ul.ie/bhalpin/wordpress/?p=365
    8. https://www3.nd.edu/~rwilliam/stats3/L05.pdf

    Linda

    ------------------------------
    Linda A. Landon, PhD, ELS

    Research Consultant

    PhD, Molecular Pharmacology
    Graduate Certificate, Applied Statistics
    Board-Certified Editor in the Life Sciences

    Research Communiqué
    Clear, Concise Statistics & Words
    LandonPhD@ResearchCommunique.com
    573-797-4517



  • 6.  RE: Pseudo R squared

    Posted 12-27-2016 12:41

    Linda and other colleagues.  I  went through the attached article and supplemental materials.  I hope that you are not violating any imposed or implied confidentiality agreement in your role as referee.  I will try to be brief.  It appears that the authors are applying several fairly sophisticated statistical methods without clear rationale.  It is also not clear if a statistician is among the coauthors.  More likely in my mind a statistician was consulted but not as an active participant.  The simulations and the comparison with real data raises more questions to me than it answers. As an example I don't understand why the hospitals have rates simulated from uniform random numbers on [0, 1].   Also I don't see any references to the statistical papers that the methods are based on.

    Regarding the term likelihood, from their conversations it looks like they are using the term to mean probability.  There is nothing to indicate the use of a model and a likelihood function for the parameters of the model.  But in Efron's work models and likelihood functions seem to be included in his examples.  In the supplemental materials "likelihoods" for the paper you are reviewing, many of the tables list % of hospitals in disjoint bed sizes as likelihoods but they sum to well over 100%.

    I also was trying to see if there were formal hypothesis tests could come in.  I could not really tell if the study is exploratory or or inferential but there are conclusions being made that could affect US policy on health care.  However there are no p-values cited that I could see.  In any event I agree that there are more important aspects of the paper to be concerned about than p-value adjustment.

    I have not looked into this very carefully and don't think that I should.  So take my comments as something for you (Linda) to look into if you haven't already.

    ------------------------------
    Michael Chernick



  • 7.  RE: Pseudo R squared

    Posted 12-27-2016 16:04

    Michael,

    Thanks so much for your feedback.  

    First, to assuage your fears about ethics:  This is a published article so I am not acting as a peer-review referee prior to acceptance and publication.  My client discovered it, was interested in the possible applications to their work, and wanted the opinion of a statistician as to the validity of the research. Your point about ethical conduct of peer-review is very important, though, and it serves all of us well to be reminded of our ethical obligations.

    Per your review, I breathed a sigh of relief because your response is similar to my response to the paper. Because I am a relatively newly-minted statistician, I always keep the limits of my knowledge in mind. Consequently, your analysis is quite valuable to me.  Like you, I was particularly concerned that what the authors presented as "likelihood" was in fact probability or summary statistics.  

    Another concern was using a single pseudo R square to compare the fit of one model to two different data sets. Another colleague responded in an email that R square could be used to compare the fit of one model to two different data sets and agreed with Michael Maranda's comments in his post.  

    Thanks again for your response.

    Linda

    ------------------------------
    Linda A. Landon, PhD, ELS

    Research Consultant

    PhD, Molecular Pharmacology
    Graduate Certificate, Applied Statistics
    Board-Certified Editor in the Life Sciences

    Research Communiqué
    Clear, Concise Statistics & Words
    LandonPhD@ResearchCommunique.com
    573-797-4517



  • 8.  RE: Pseudo R squared

    Posted 12-27-2016 15:43

    Lara Harmon of AmStat is going to remove the attachments to this posting due to copyright issues.  Here is the URL links to the article.  It is available as open access so you will be able to download it for yourself.

    http://journals.sagepub.com/doi/full/10.1177/1062860616681840?utm_source=PolicyCrush&utm_campaign=710a683657-EMAIL_CAMPAIGN_2016_12_20&utm_medium=email&utm_term=0_fe688512b8-710a683657-115413909&

    I apologize for the confusion. 

    ------------------------------
    Linda A. Landon, PhD, ELS

    Research Consultant

    PhD, Molecular Pharmacology
    Graduate Certificate, Applied Statistics
    Board-Certified Editor in the Life Sciences

    Research Communiqué
    Clear, Concise Statistics & Words
    LandonPhD@ResearchCommunique.com
    573-797-4517



  • 9.  RE: Pseudo R squared

    Posted 12-26-2016 21:43

    Dear Linda,

    I think it really depends on what you want to use R-squared to measure when fitting a generalized linear model. Classical R-squared measures the explained variance in fitting homoscedastic linear models, where it coincides with explained information.

    I have recently written a paper on defining a variance-function-based R-squared for generalized linear models (including logistic regression and even quasi models), see http://www.tandfonline.com/doi/full/10.1080/00031305.2016.1256839. My simulation study experience is that, some previously defined measures except the K-L-distance-based R-squared can seriously overstate. So I would recommend the  K-L-distance-based R-squared for logistic regression. In general, the variance-function-based R-squared performs similarly to the K-L-distance-based R-squared, but the latter one does not work for quasi models (knowing mean and variance functions only). So, if you want to calculate R-squared for quasi models, you can download my package available at https://cran.r-project.org/web/packages/rsq/index.html.

    Hope this helps.

    Best,

    Dabao

    ------------------------------
    Dabao Zhang
    Associate Professor
    Purdue University



  • 10.  RE: Pseudo R squared

    Posted 12-27-2016 01:21

    I chose to contact Linda privately for fear that others would be irritated about getting email that they consider junk because the topic doesn't interest them.  But since I want to make additional comments and so many of you submitted very thoughtful answers to the group, I am doing so now.  The point i want to make is that Linda should not dismiss Stan Young's issue about multiple testing since Linda said that the study is not exploratory.  That is exactly when P-value adjustment needs to be considered and is too commonly ignored.

    Stan is an expert on this.  I think he is too modest to mention the wonderful book he coauthored with Peter Westfall for Wiley in 1993  on resampling-based p-value adjustment.  The bootstrap and permutation approaches to this problem are very helpful especially when methods like Bonferroni are overly conservative.  I have advocated it, used it in my work and referred to it in my books on the bootstrap.

    ------------------------------
    Michael Chernick



  • 11.  RE: Pseudo R squared

    Posted 12-27-2016 08:19

    Michael, Stan, and everyone.

    Michael's response highlights the value of the ASA Connect to all of us: The ability to present and discuss varied understandings of nd approaches to statistics. Understandably, some might consider specific information to be "junk" but there are those of us who haven't encountered that information previously.  Especially for statisticians like me who are relatively inexperienced and who work isolated from other statisticians, the collegiality and willingness to share information and insight that is demonstrated by the more experienced statisticians who post to ASA Connect is invaluable.

    I would like to make clear that I didn't mean to imply that I had dismissed Stan's comments on the dangers of multiple testing or to ignore his obvious breadth and depth of knowledge.  I meant to communicate that the authors were not conducting an exploratory analysis and had not conducted multiple testing.  

    Stan and Michael, thanks again for your responses.  

    Linda

    ------------------------------
    Linda A. Landon, PhD, ELS

    Research Consultant

    PhD, Molecular Pharmacology
    Graduate Certificate, Applied Statistics
    Board-Certified Editor in the Life Sciences

    Research Communiqué
    Clear, Concise Statistics & Words
    LandonPhD@ResearchCommunique.com
    573-797-4517



  • 12.  RE: Pseudo R squared

    Posted 12-27-2016 08:22

    Dabao,

    Thank you for these suggestions.  I certainly will study them and I'm sure the information will add to my understanding of pseudo R square in logistic regression.

    Linda

    ------------------------------
    Linda A. Landon, PhD, ELS

    Research Consultant

    PhD, Molecular Pharmacology
    Graduate Certificate, Applied Statistics
    Board-Certified Editor in the Life Sciences

    Research Communiqué
    Clear, Concise Statistics & Words
    LandonPhD@ResearchCommunique.com
    573-797-4517