ASA Connect

 View Only
Expand all | Collapse all

Regression coefficients vs causal coefficients

  • 1.  Regression coefficients vs causal coefficients

    Posted 07-19-2017 14:25
    I came across the following problem. Let assume, I have one dependent variable Y and several independent variables X supposedly affecting this variable. All these X are exogenous in a sense, that I don't know anything which affects them. Now I want to estimate the relationship between Y and X (assuming everything is linear, etc.) and use just regular linear regression techniques. On the other hand, one can use SEM software or other approaches (like Pearl's DAG approach) and obtain so called "causal coefficients".  

    My question - would these coefficients be equal to regression ones in this situation or not? 
    Many thanks for any comments. 

    Igor Mandel, VP, Telmar


  • 2.  RE: Regression coefficients vs causal coefficients

    Posted 07-19-2017 15:00
    This manuscript by Bollen and Pearl may aid your understanding; myth #2 on page 13 specifically addresses the conceptual differences. Short answer would be that the two parametrisations can be constructed to provide conditionally exchangeable distributions, but whether the restrictions imposed under simultaneous regularity conditions are what is truly desired about the system is another matter entirely.

    @incollection{bollen2013eight,
    title={Eight myths about causality and structural equation models},
    author={Bollen, Kenneth A and Pearl, Judea},
    booktitle={Handbook of causal analysis for social research},
    pages={301--328},
    year={2013},
    publisher={Springer}
    URL={http://ftp.cs.ucla.edu/pub/stat_ser/r393.pdf}
    }

    ------------------------------
    Landon Hurley
    ------------------------------



  • 3.  RE: Regression coefficients vs causal coefficients

    Posted 07-20-2017 09:12
    Many thanks, London. However, the conceptual difference is one thing (I may say a lot about it, especially in J. Pearl's version of causality, which has too many ill formulated problems http://ssrn.com/abstract=2984045), but the values of "causal" and regression coefficients in this simple situation (which I'm interested in) is quite another. Are they different or not? 

    As for the paper you refer to - yes, I familiar with it and even published (with a colleague) a critical review of this Handbook in Technometrics ((2015) Book Reviews, Technometrics, 57:2, 292-300, DOI: 10.1080/00401706.2015.1052714). 

    The only reason I put this question for discussion is that when I tried to put it, among several others, into "Causality blog" http://causality.cs.ucla.edu/blog/ (where its originator J. Pearl regularly invites to put questions, etc.), it was rejected exactly for the reason, that I don't understand the difference between causal and regression coefficients. Well, I plainly admit the fact that they may be different in complex DAG (and never denied it), but still, struggle to understand, what is the difference when DAG has its most primitive shape, Y as a function of several X without any strings attached to X. 

    You mentioned two parametrizations: which are they? Are they possible even in this situation, when all X are exogenous? 

    Many thanks again - Igor







  • 4.  RE: Regression coefficients vs causal coefficients

    Posted 07-20-2017 10:08
    I sympathize with Igor.  I have an R package to implement causality of the type Igor wants.  However
    nonlinearity seems to be needed for determining that a variable is exogenous.

    Causal Paths from data and new exogeneity tests in {generalCorr} Package for Air Pollution and Monetary Policy

    by H. D. Vinod, Ph.D., Fordham University, NY, June 7, 2017.

    Since causal paths from data are important for all sciences, my R package `generalCorr' provides sophisticated functions. The idea is simply that if X causes Y (path: X->Y) then non-deterministic variation in X is more "original or independent" than similar variation in Y. We compare two flipped kernel regressions: X=f(Y, Z) and Y=g(X,Z), where Z are control variables. Our first two criteria compare absolute gradients (Cr1) and absolute residuals (Cr2), both quantified by stochastic dominance of four orders (SD1 to SD4). Our third criterion (Cr3) expects X to be better able to predict Y than vice versa using generalized partial correlation coefficients r*(X,Y|Z). These methods allow us to create a replacement for the Hausman-Wu medieval-style diagnosis of endogeneity relying on showing that a dubious cure (instrumental variables) works. The ultimate causal path: X->Y depends on a weighted sum (strength) of all three criteria. Bootstrap inference is also available.

    A new version of the package makes it possible to get evidence on causal paths with few lines of code. Details and examples are at: Vinod, H. D., "Causal Paths and Exogeneity Tests in {generalCorr} Package for Air Pollution and Monetary Policy" (June 6, 2017). Available at SSRN: 

    https://ssrn.com/abstract=2982128

     

    [code lang="r"]

    install.packages("generalCorr"); require(generalCorr); options(np.messages=FALSE); causeSummary(airquality)

    [/code]

         cause     response strength corr.     p-value

    [1,] "Solar.R" "Ozone" "100"   "0.3483" "0.00018"

    [2,] "Wind"   "Ozone" "31.496" "-0.6015" "0"    

    [3,] "Temp"   "Ozone" "100"   "0.6984" "0"    

    [4,] "Month"   "Ozone" "31.496" "0.1645" "0.0776"

    [5,] "Day"     "Ozone" "31.496" "-0.0132" "0.88794"

     



    ------------------------------
    Hrishikesh Vinod
    ------------------------------



  • 5.  RE: Regression coefficients vs causal coefficients

    Posted 07-21-2017 11:34
    Many thanks, Hrishikesh, I looked at your article - very interesting, I'll think more about it. 

    But just for clarification: it seems that your problem (and your proposed solution of it) is - how to say that X variables are exogenous, i.e. not under control of some other variables. And this is very important to know, of course.

    But my problem was, in fact, simpler: 

    if I know that X variables are exogenous (or ignore the possibilities that they are not), and 
    if I know that relation between X and Y is linear, and
    if I don't care too much about how to residuals are distributed (normal, or not normal), and 
    if there is only one Y and several X -

    would be any difference between usual linear regression coefficients and any form of "causal" or "path" coefficients?

    It looks like Brandy replied positively to that question - many thanks, Brandy!

    Thanks to all again for consideration - Igor







  • 6.  RE: Regression coefficients vs causal coefficients

    Posted 07-20-2017 11:29
    Hi Igor,

    Do you have a single Y outcome variable or several Y's (Y1, Y2, ..., Ym)?

    If I had a single Y outcome variable, I would use linear regression.
    While the Beta's would be the same in path analysis and in linear regression,
    the variances would be unbiased with linear regression.

    In path analysis, the variances are asymptotically unbiased, but biased for smaller samples.

    ------------------------------
    Brandy Sinco, BS, MA, MS
    Research Associate
    ------------------------------



  • 7.  RE: Regression coefficients vs causal coefficients

    Posted 07-24-2017 10:53
    I thank all participants of this discussion for their (alas, futile) efforts to relieve my confusion in this area.
    I understand Rubin's approach to measuring the size of causal effects (see Holland, JASA 1986), but various regression methods to do the same thing seem to shift from casual inference to causal inference (a simple vowel movement) with no justification than hope.
    I will continue to try to understand, but it has been years -- I fear I am hopeless.

    ------------------------------
    Howard Wainer
    Extinguished Research Scientist
    ------------------------------



  • 8.  RE: Regression coefficients vs causal coefficients

    Posted 07-25-2017 08:42

    Howard, I envy you - it seems, you accept Rubin's (counterfactual) approach as "casual" while considering regression (not counterfactual) as "causal". If any of it were causal and casual without the brackets! Alas... I collected a lot of arguments against the counterfactual theory in any (Rubin's or Pearl's or others versions) here - http://ssrn.com/abstract=2984045; maybe, it would promote the point that the real causal theory in statistics does not yet exist and, most likely, will never be created.



    ------------------------------
    Igor Mandel
    Telmar, Inc.
    ------------------------------



  • 9.  RE: Regression coefficients vs causal coefficients

    Posted 07-26-2017 18:13

    I tried to click on the link to Igor Mandel's collection of arguments against counterfactual theory but got "page cannot be found" message.   Any alternative way to get to collection?

     

    thanks

     

    Kevin Little, Ph.D.

    Informing Ecological Design, LLC

    2213 West Lawn Avenue

    Madison, WI  53711

    tel 608.251.4355  fax 888.247.7543

    email klittle@iecodesign.com

    http://www.iecodesign.com

     






  • 10.  RE: Regression coefficients vs causal coefficients

    Posted 07-26-2017 19:20

    Kevin

     

    A little experimentation shows that the cited address is wrongly written. It should read:

    http://ssrn.com/abstract_id=2984045.

    I found this by putting the number 2984045 into the search box on the SSRN site.

     

    Hope this helps

     

    Peter Kenny

     

     






  • 11.  RE: Regression coefficients vs causal coefficients

    Posted 07-27-2017 08:48
    Many thanks, Kevin! I don't know the reason - on my machine, it opens the page with this address (just checked again!), but I really appreciate your efforts and experimental evidence that the address is http://ssrn.com/abstract_id=2984045
    Igor 


    ------------------------------
    Igor Mandel
    Telmar, Inc.
    ------------------------------



  • 12.  RE: Regression coefficients vs causal coefficients

    Posted 07-28-2017 09:15
    Peter, I just realized that YOU found the way to made a correct link, answering to Kevin - many thanks for that and sorry for my confusion with previous posts.
    Igor

    ------------------------------
    Igor Mandel
    Telmar, Inc.
    ------------------------------



  • 13.  RE: Regression coefficients vs causal coefficients

    Posted 07-25-2017 21:00
    Hi Igor,

    I'll take a stab at this. Just to be clear, Pearl's approach to causality isn't about *how* to estimate causal effects. It's about the *ability* to estimate causal effects -- identifiability -- from observational data. In other words, Pearl's theorems will tell you whether it's possible to estimate the causal effects---as defined by the DAG---from observational data, but they don't tell you how to estimate them.

    Now to your question... To restate the setup of your problem in light of the above, you have a response variable Y and several X variables X1, ..., Xp that each has a direct causal effect on Y and do not have any causal effects on each other. A DAG representing this scenario would have an arrow from each X variable pointing into the Y variable and no other arrows. Additionally, you are comfortable assuming that these causal effects are linear.

    According to Pearl's theorems (actually, the linear case goes back to Wright, I think), the causal effects in the DAG are identifiable and can be estimated from the observational data.

    Now that Pearl's theorem's have established the identifiability of the causal effects, the question then becomes how should we estimate the parameters of Pr(Y | X1, ..., Xp)? We are OK with linearity, so OLS estimation will do nicely. However, we may also use estimation methods from the SEM literature---full-information methods or simultaneous methods (see, for example, Chap 11 of Kline's "Principles and Practice of Structural..." 4th ed). These methods were developed primarily to estimate parameters from several equations simultaneously, but there is no reason why we can't use them to estimate parameters from a single equation. In fact, the estimates are equivalent to the OLS estimates in this case. (I think there might be some small differences in degrees of freedom between full-information methods and OLS, but I think the estimates themselves are mathematically equivalent.)

    Below is some R code that demonstrates this (with only slight differences in due to numerical estimation):

    > library(sem)
    > n <- 100000
    > X1 <- rnorm(n)
    > X2 <- rnorm(n)
    > X3 <- rnorm(n)
    > Y <- 3*X1 + 2*X2 - 1*X3 + rnorm(n)
    > lmout <- lm(Y ~ X1 + X2 + X3 + 0)
    >
    > mod <- specifyEquations(
    + text="Y = beta1*X1 + beta2*X2 + beta3*X3"
    + , covs=c("Y", "X1", "X2", "X3"))
    Read 1 item
    > semout <- sem(mod, data=data.frame(Y=Y, X1=X1, X2=X2, X3=X3))
    >
    > coef(lmout)
    X1 X2 X3
    2.9974998 2.0015475 -0.9978271
    > coef(semout)[1:3]
    beta1 beta2 beta3
    2.9975046 2.0015477 -0.9978332
    > library(sem)
    > n <- 100000
    > X1 <- rnorm(n)
    > X2 <- rnorm(n)
    > X3 <- rnorm(n)
    > Y <- 3*X1 + 2*X2 - 1*X3 + rnorm(n)
    >
    > lmout <- lm(Y ~ X1 + X2 + X3 + 0)
    >
    > mod <- specifyEquations(
    + text="Y = beta1*X1 + beta2*X2 + beta3*X3"
    + , covs=c("Y", "X1", "X2", "X3"))
    Read 1 item
    > semout <- sem(mod, data=data.frame(Y=Y, X1=X1, X2=X2, X3=X3))
    >
    > coef(lmout)
    X1 X2 X3
    2.9955082 1.9997821 -0.9982949
    > coef(semout)[1:3]
    beta1 beta2 beta3
    2.9955087 1.9997834 -0.9982945

    Hopefully, that was helpful.

    Best
    McKay

    ------------------------------
    Steven Curtis
    Principal, Decision Science
    The Walt Disney Company
    ------------------------------



  • 14.  RE: Regression coefficients vs causal coefficients

    Posted 07-26-2017 11:31
    Hi Steven,

    Many thanks for your detailed answer and the program, really appreciate it. Yes, your conclusive part coincides with mine - they should be equal in this situation. I repeat myself, but let just state again - the reason I put this question was that on Pearl's causality blog I got the answer that they are not equal, what surprised me and lead to this thread. However, on a way, many new interesting things were touched, and your answer is a very good illustration to that.
    Especially your thesis, that Pearl just tests the identifiability but not gives the estimates. I agree that many of theorems are about that, but in a specific sense: does the path allow to say something about "causes" or not. In that aspect, I have no problem with DAG theory at all. To answer that, you have to have a real DAG first. But when it comes to the simplest case like what I described, the theory says "you can", but regression says "you cannot" - in a sense, that regression coefficients themselves have nothing to do with causality (they may or may be of causal nature). Don't you feel the troublesome problem here?  
    On the other hand, J. Pearl does provide the estimates of causal effects, at least in certain special form. All his do-operators intend to do just that - he makes different (numerical) conclusions and so on, even when regression is not involved. A different question - how all that is "casual". I consider it in details in Troublesome Dependency Modeling: Causality, Inference, Statistical Learning by Igor <g class="gr_ gr_2359 gr-alert gr_gramm gr_inline_cards gr_run_anim Style multiReplace" id="2359" data-gr-id="2359">Mandel :</g>: SSRN and show, that it is a special form of "imaginary indicators", similar to the one used in index numbers for more than a hundred years. It is not "causal" in a scientific "based on facts" meaning. 

    Once again - thank you very much for your thorough answer.

    Igor

    ------------------------------
    Igor Mandel
    Telmar, Inc.
    ------------------------------