ASA Connect

 View Only
Expand all | Collapse all

Propensity Score Matching

  • 1.  Propensity Score Matching

    Posted 02-24-2015 10:44

    Hello everyone,

    I need your help with my study.

    The treatment selection bias arises in observation studies because treatment allocation is not random. We are trying to use the Propensity Score Matching to balance the observed baseline characteristics. I just want to know if there any other statistical methodology I can use except this one. Thanks for any suggestions.

    And I wonder if anyone heard about the Mahalanobis metric matching before. I will really appreciate if you can provide me any reference source you have. Thanks!



    -------------------------------------------
    Yi Lu
    Health Services Researcher
    -------------------------------------------


  • 2.  RE: Propensity Score Matching

    Posted 02-25-2015 15:35
    Possibilities are:  

    Yunfei PL, Kathleen JP, Paul RR. Balanced Risk Set Matching. Journal of the
    American Statistical Association, Vol. 96, No. 455 (Sep., 2001), pp. 870-882

    X Gu and P R. Rosenbaum. Comparison of Multivariate Matching Methods:
    Structures, Distances, and Algorithms. Journal of Computational and Graphical Statistics
    Vol. 2, No. 4 (Dec., 1993), pp. 405-420

    -------------------------------------------
    Robert Elston
    Case Western Reserve University
    -------------------------------------------




  • 3.  RE: Propensity Score Matching

    Posted 02-25-2015 17:27

    Propensity for treatment choice is frequently difficult to predict from patient pretreatment characteristics.  For example, models using linear functionals make strong and possibly unrealistic assumptions about how true propensity varies.  Alternatives like patient matching and/or clustering can be better (more fine and "fair") in both theory and practice simply because they are more flexible.  Here are some recent references:

    1. Obenchain RL. "The Local Control Approach using JMP." Analysis of Observational Health-Care Data Using SAS, Faries DE, Leon AC, Maria Haro J, Obenchain RL eds. Cary, NC: SAS Press, 2010.
    2. Iacus SM, King G, and Porro G. Causal Inference without Balance Checking: Coarsened Exact Matching. Political Analysis, 20, 1-24, 2012.
    3. Obenchain RL, Young SS. Advancing Statistical Thinking in Observational Health Care Research, Journal of Statistical Theory and Practice, 7, 456-469, 2013.
    4. Lopiano KK, Obenchain RL, Young SS. Fair Treatment Comparisons in Observational Research, Statistical Analysis and Data Mining, 7, 376-384, 2014. (Special Issue on Observational Health Care Data.)
    -------------------------------------------
    Robert Obenchain
    Principal Consultant
    Risk Benefit Statistics LLC
    -------------------------------------------




  • 4.  RE: Propensity Score Matching

    Posted 02-26-2015 05:47
    Hi,
    This paper of mine may also be useful; it gives an overview of matching methods, including propensity scores and Mahalanobis distance.

    Stuart, E.A. (2010). Matching Methods for Causal Inference: A review and a look forward. 
    Statistical Science 25(1): 1-21. 
    http://projecteuclid.org/DPubS?service=UI&version=1.0&verb=Display&handle=euclid.ss/1280841730

    Thanks,
    Liz

    -------------------------------------------
    Elizabeth Stuart
    Associate Professor
    Johns Hopkins Bloomberg School of Public Health
    -------------------------------------------




  • 5.  RE: Propensity Score Matching

    Posted 02-26-2015 07:31
    I have a concern about any method that discards already-collected observations.  There are problems with reproducibility and arbitrariness.  I am especially concerned that most matching methods discard observations that are in the overlap region between two treatments.  Also, most matching methods are not invariant to how the dataset was ordered.  I tend to prefer regression adjustment by a regression spline in the logit propensity after understanding the overlap region and possibly restricting the analysis.  I know of at least one paper that shows there are some problems with adjustment for propensity via regression but I still feel it is an excellent approach overall.   If anyone has a simple example of where my strategy goes wrong I'd love to see it and possibly simulate it.

    At the heart of the argument, in the simple case where the number of covariates is very small, is concerns about extrapolation of parametric models.  One could say that a fundamental assumption of regression is lack of interaction between treatment and a covariate.  When there is little overlap in the covariate distributions, the interaction test has low power but confidence intervals for treatment effect as a function for the covariate are automatically very wide in the non-overlap region.  The problem with matching is that if there is interaction between treatment and a covariate, a matched analysis typically just covers it up.

    -------------------------------------------
    Frank Harrell
    Vanderbilt University School of Medicine
    -------------------------------------------




  • 6.  RE: Propensity Score Matching

    Posted 02-26-2015 08:44
    Many methods are available through the R MatchIt package, including Mahalanobis metric.  The documentation is clear and easy to read. 
    http://gking.harvard.edu/matchit

    Also, Liz Stuart maintains a webpage with where to find matching methods for other statistical packages, including with the Mahalnobis metric:
    http://www.biostat.jhsph.edu/~estuart/propensityscoresoftware.html

    -------------------------------------------
    Janet Rosenbaum
    -------------------------------------------




  • 7.  RE: Propensity Score Matching

    Posted 02-26-2015 10:33
    Instrumental variable is an alternative that may be preferable to adjustment or matching as it gets at unmeasured confounders of treatment.  Can anyone propose an authoritative reference for this approach?  The one's I am familiar with are rather superficial.

    Classical multivariable analysis (MV) is another.  I've been looking for empirical evidence to show that propensity score (PS) matching is in any way shape or form superior to MV risk-adjustment and have seen none so far.  Some papers show that MV is superior to PS method but I am under impression that the issue is far from settled in favor of either MV or PS. I am under constant pressure to produce propensity matching but it seems to me that it's more of a fad than a superior methodology.  Anyone disagree with that?  

    -------------------------------------------
    Haris Subacius
    Northwestern University
    -------------------------------------------




  • 8.  RE: Propensity Score Matching

    Posted 02-27-2015 10:49

    Thanks for everyone's input. It is very useful. The issue is when I used the Magalanobis metric matched 1:1 for our study. The total sample size was reduced dramatically. I had less control than treated individuals. The treated individuals were about 3.5 times greater than control. What will be the appropriate matching method to use here? I am thinking about using PS weighting or PS matching with replacement. Any suggestions? Thanks!


    -------------------------------------------
    Yi Lu
    Health Services Researcher
    -------------------------------------------




  • 9.  RE: Propensity Score Matching

    Posted 03-02-2015 09:58
    That's not unexpected, but also not inevitable.  You have to try lots of methods and see what addresses all of your needs:  e.g., gives the best balance on the relevant variables and the largest sample sizes.  It's a judgment call which match is best.  It may require many attempts and many combinations of covariates.  There is no way to know ahead of time what works best. It's trial and error.  The matchit package has everything you need for diagnostics.  As earlier poster suggested, look at both means and distributions.

    -------------------------------------------
    Janet Rosenbaum
    -------------------------------------------




  • 10.  RE: Propensity Score Matching

    Posted 03-03-2015 07:22
    I would appreciate some comments from the group on my regression adjustment for propensity strategy.  To me matching has too many options and I don't use methods that discard observations.

    -------------------------------------------
    Frank Harrell
    Vanderbilt University School of Medicine
    -------------------------------------------




  • 11.  RE: Propensity Score Matching

    Posted 03-04-2015 03:00
    Dear Frank,

    I share your concerns and also tend to favour regression adjustment for the propensity score for the following additional reasons:
    - it adjusts for residual confounding due to imperfect matching;
    - it makes more efficient use of the information in the data;
    - adjustment for both the propensity score and covariates can result in estimators with a double robustness property: they are valid if either the propensity score is correct or the covariates in the outcome model are correctly modelled.
    - matching is often preferred for simplicity, but I tend to believe that regression adjustment for the propensity score is much simpler because one loses the simplicity of matching whenever one wishes to adjust for residual confounding due to imperfect matching or wishes to obtain valid standard errors.

    Note that, like matching methods, regression adjustment for the propensity score also prevents the dangers of model extrapolation in settings where the treated and untreated are very different in their observed covariate data, in the sense that
    - subjects with covariate values at which there are nearly only treated or non-treated individuals are down-weighted, as they contribute little or no information about treatment effect;
    - valid estimators of treatment effect can be obtained when the propensity score model is correct, even when the association between outcome and propensity score is misspecified. This follows from the aforementioned double robustness property.

    For details, please see Vansteelandt, S. and Daniel, R.M. (2014). On regression adjustment for the propensity score. Statistics in Medicine, 33, 4053-4072.

    Best wishes,

    Stijn Vansteelandt.
    -------------------------------------------
    Stijn Vansteelandt
    Ghent University
    -------------------------------------------




  • 12.  RE: Propensity Score Matching

    Posted 03-04-2015 08:34
    Thanks very much for your note Stijn.  I have recommended your paper to others and like it very much.
    Frank

    -------------------------------------------
    Frank Harrell
    Vanderbilt University School of Medicine
    -------------------------------------------




  • 13.  RE: Propensity Score Matching

    Posted 03-04-2015 12:21
    Frank,

    I have nothing definitive to say, but some thoughts that I think might be helpful to the discussion.

    If you feel you must include the propensity score as a regression adjustment, Bang and Robins (2005) show the functional form that gives a consistent estimate. (Actually be sure you look at Correction to "Doubly Robust Estimation in Missing Data and Causal Inference Models"
    Correction to "Doubly Robust Estimation in Missing Data and
    Causal Inference Models," by H. Bang and J. M. Robins; 61,
    962-972, December 2005
     in Biometrics 64, 2008)

    If you include a covariate X_i = t_i*1/p_i - (1-t_i)*1/(1-p_i) then they note

    "The estimator ... solves a long-standing open problem in the estimation of treatment effects: what function (or functions) of the PS needs to be added to a model Ψ{s(Δ,V; β)} for E(Y | Δ, V) in order to ensure consistent estimation of the average treatment effect when the PS is modeled correctly but the OR model Ψ{s(Δ, V; β)} is incorrect."

    There's one other step you need to do to get the treatment effect estimate after fitting this model (basically compute the average difference in the model's predictions when t=1 and t=0).

    That said, this choice of functional form is algebraically equivalent to a form of propensity score weighting. This plus results from Lee, Lessler, Stuart and Lunceford and Davidian have caused me to use propensity score weighting exclusively.

    So I always use propensity score weighting... just make sure that you have good, well-calibrated propensity score estimates...and that usually means not using standard logistic regression. Weighting will also satisfy your aim not to discard observations, although it might just assign weights that are small, particularly when estimating treatment effects on the treated.

    I'd always be interested in putting different methods in a head-to-head contest. Hade and Lu (2014) did a comparison that included using B-splines on the propensity score and it seems to do reasonably well, but there's plenty of room to quibble about the breadth of scenarios examined and whether the competing methods were as competitive as they could be.

    I've coded up my standard practice in R and posted it at fastDR at GitHub, if you would like to experiment with a method with limited options (or the defaults are usually okay) and doesn't discard cases.

    Greg

    -------------------------------------------
    Greg Ridgeway
    Associate Professor
    University of Pennsylvania
    -------------------------------------------




  • 14.  RE: Propensity Score Matching

    Posted 03-04-2015 13:33
    Frank, I hope this won't preclude our hashing this out over lunch, but I'll answer your call for feedback here. I'll start with clarifying how I think of matching.

    1) "Matching" does not mean "unadjusted analysis". Using matching in an observational cohort study should not be used as an excuse to avoid good modeling. Matching will often make the unadjusted (marginal) effect of interest consistent with the covariate adjusted (conditional) effect, but not always. When it does, that makes for a very persuasive paper. But if there is sufficient data to detect an important interaction with the exposure, proper modeling of the matched cohort will find that.

    Liz is coauthor on one of my favorite papers: Ho, Imai, King, Stuart (2007) Matching as nonparametric preprocessing for reducing model dependence in parametric causal inference; Political Analysis. Here they make the case for matching as a cohort selection tool to be used with the modeling procedures you would have chosen without doing matching. Admittedly, if you would have perfectly specified your model, matching before using that model will cost some efficiency. Nothing can beat a perfectly specified model on the whole cohort in terms of efficiency. However, if you are human and are not certain you've perfectly specified your model, matching offers some protection, i.e. the matching provides robustness to modeling choices. I'll revisit this below.

    2) "Matching" does not necessarily mean "1:1 matching". The benefits of matching need to be played against the losses. If a particular matching method is resulting in a substantial loss in efficiency (CI precision, Type II error), a different approach should be used, e.g. 1 to fixed k matching, 1 to variable k, many to many, full matching, etc. If a study has no power to spare, the benefits of matching may not be worth the cost. If I have 30 subjects, it's unlikely I'll be convinced to create a matched cohort with 20 subjects. But if I have 30K subjects, I may find a matched cohort of 20K makes a much more compelling study design. Given the logistics of creating EHR study cohorts, the incremental data collection cost of those 10K subjects that I discard is essentially zero.

    3) Matching is not unique in throwing away some subjects. We all use methods that throw away subjects. We don't download every patient in an EHR database for an analysis comparing two specific anti-diabetic drugs. We always have inclusion/exclusion criteria. Sometimes these are fully pre-specified and sometimes they are data driven. Matching is a data driven cohort selection method.

    4) Done well, matching is fully reproducible. Yes there are several moving parts in matching methods and various decisions must be made, but done correctly it is no less reproducible than, say, modeling using multiple imputation. Modeling with MI has plenty of moving parts and requires decisions of which method to use. Both matching and MI can be made exactly reproducible with explicit code and both could vary dramatically between different research groups who have made different methodological choices. In this sense, matching is no different from modeling in terms of reproducibility.

    5) To sum up, I'll put it this way. Combined with good modeling, matching is doubly robust estimation for the conditional treatment effect among those likely to be treated, i.e. doubly robust estimation for ATT|X. If we like the idea of doubly robust estimation in the case of using covariate adjustment and IPTW for estimating the ATE|X, we shouldn't be that adverse to matching. Matching is essentially a weighting scheme. It's a strange weighting in that it may handle a cluster of over-represented subjects by giving some of them weight 0. That property will always bother you, and perhaps rightly so. It bothers me. Nevertheless, matching may be seen as a weighting scheme. In general, weighting can provide robustness, control for bias, and persuasiveness at the cost of some efficiency when compared to covariate adjustment alone. Depending on how much power I have to play with, I may be happy to pay that price. 


  • 15.  RE: Propensity Score Matching

    Posted 03-04-2015 13:44
    Frank,

    I agree with you about dropping cases via matching. However, I think
    that after estimating propensity scores and before using them in an
    analysis, the distributions of the propensity scores by "treatment"
    group need to be examined. If there are notable areas of non-overlap,
    there is a problem and just including everyone in the analysis with the
    propensity score as a covariate will not solve the problem. For example,
    I have had situations where the propensity scores for the control group
    were predominantly in the range of 0 to 0.5 (with a few higher
    stragglers up to about 0.8), while the propensity scores for the
    treatment group went from about 0.3 to almost 1 (with a very few lower
    stragglers down to about .1). I think that including all the propensity
    scores can give a misleading result due to extrapolation to areas where
    there is no "common support".

    Rich Goldstein
    consulting statistican




  • 16.  RE: Propensity Score Matching

    Posted 03-05-2015 15:15
    I think Richard's point is very important and well explained. Going further, I would suggest that any overall average effect may be difficult to interpret, because causal-effect homogeneity should be assumed. This is shown in my book Bias and Causation, Chapter 7. Thus, examining the estimated difference across different strata of the propensity score would be more informative.

    -------------------------------------------
    Herbert Weisberg
    President
    Causalytics, LLC
    -------------------------------------------




  • 17.  RE: Propensity Score Matching

    Posted 03-04-2015 16:20
    I'm trying to play devil's advocate against myself and come up with a case for PS covariate adjustment (summarizing the covariates with a propensity score and using the PS as a covariate).

    Setting 1 - a collapsible model) Here PS covariate adjustment is unbiased, but I'm having trouble making a case for it. If the effective sample size is sufficient for using all of our covariates in the PS model, shouldn't it be big enough to do direct covariate adjustment in this setting? Why would a summary measure for our covariates be beneficial here?

    Setting 2 - a noncollapsible model) Here we could have a large effective sample size for estimating the PS but a small effective sample size for the outcome, e.g. lots of exposed and unexposed subjects but very few events. Matching is not enticing here because there is so little power. Likewise, we may be worried about low power using IPTW, even though it will give an unbiased estimate of the marginal effect. So an important question is whether the PS covariate adjustment will yield a biased result. This brings us back to Austin (2013, Stat Med, http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3068290/) as well as his 2007, 2007, 2008, and 2010 papers. The zeitgeist of these is the PS covariate adjustment will be biased, at least biased for the quantities we may be wanting it to approximate. I suspect we could show via simulation it does a great job estimating the conditional hazard ratio when the true data generation model uses just the exposure and a true PS, but that seems like begging the question. Moreover, the bias may become trivial as the effective sample size becomes large, but that setting also opens up IPTW, matching, and direct covariate adjustment to us.

    -----------------------------------------------------------------
    Robert Alan Greevy, Jr, PhD
    Director, Health Services Research Biostatistics
    Associate Professor of Biostatistics
    Vanderbilt University School of Medicine
    -----------------------------------------------------------------



  • 18.  RE: Propensity Score Matching

    Posted 03-05-2015 18:23
    I am sorry. I was planning to respond to the thread, but for some reason created my own post:

    http://community.amstat.org/communities/alldiscussions/viewthread/?GroupId=2653&MID=23574

    On the specific point we did find that with one unbalanced covariate regression adjustment with splines does not have great operating characteristics, when treatment effect exists. This is true for binary and continuous outcomes.

    -------------------------------------------
    Roee Gutman
    Assistant Professor of Biostatistics
    Brown University
    -------------------------------------------




  • 19.  RE: Propensity Score Matching

    Posted 02-27-2015 11:39
    A thorough introduction on the topic of matching, yet non-technical and quite readable, is perhaps the textbook by Paul Rosenbaum "Design of Observational Studies" (2010) Springer. One of the chapters provides some guidelines on how to implement propensity score and Mahalanobis distance matching with R (optmatch package) and SAS. There have been newer developments on the mechanics since; for instance, the use of classification trees to obtain the propensity scores as an alternative to the traditional logistic models, which hopefully addresses the issues of non-linearity, and interactions among covariates in the propensity score model. Also, the use of genetic algorithms to perhaps address the issue of non-invariance to the ordering of the observations. Some of these developments are readily available in the MatchIt package (that I know of), and perhaps some other routines.

    A few years ago I was giving it a go for a project, following Rosenbaums's book, and found that both propensity score matching (I used a logistic model at that time) and Mahalanobis distance matching appeared to result in good-enough balance of the mean structures, i.e., the means of the covariates appeared similar enough between groups, also tests of significance and effect sizes showed no relevant differences. However, when I checked the covariance structure, (i.e., I compared correlation matrices of the covariates for each group), the matrices were quite different for the propensity score method (so using these groups would not have been an apples-to-apples comparison as intended), while the Mahalanobis distance method appeared to result in groups that were matched not only in the mean structure but also in the covariance structure. I have not had time to conduct further research on the issue, and at the time it looked like checking for balance on the covariance structure was something pretty obvious to do to assess comparability between groups, but what I have seen in applied papers is that often times, the assessment of balance stops at the mean structure. Since that time, in the few occasions that I needed to conduct matching again, I have used the Mahalanobis distance method.    

    If my memory is correct, I think Rosenbaum's text recommends a robust Mahalanobis distance (using ranks, I think), and propensity score calipers.
     
    ------------------------------------------------------
    Andres Azuero
    University of Alabama at Birmingham
    ------------------------------------------------------