ASA Connect

View Only

Back to eGroups

Expand all | Collapse all

Mathematical Coupling - Is it possible to overcome it?

1. Mathematical Coupling - Is it possible to overcome it?

1 Recommend
Isabella Ghement
Posted 04-16-2018 14:00
Hi everyone,

While working on a project, I ran into the issue of mathematical coupling and am wondering if there is any way to overcome it.

For this project, I am fitting a linear regression model like the one below:

    log(Abundance_1/Abundance_2) = beta0 + beta1*Abundance_1 + beta2*Abundance_2 + beta3*Environmental + error

where Abundance_1 and Abundance_2 are yearly measures of fish abundance in regions 1 and 2, respectively, and Environmental is an environmental variable measured yearly across both regions.

For this model, mathematical coupling arises because the outcome variable log(Abundance_1/Abundance_2) is mathematically related with the predictors Abundance_1 and Abundance_2.

After reading the article Misuses of correlation and regression analyses in orthodontic research: The problem of mathematical coupling by Yu-Kang Tu and co-authors, I understand that mathematical coupling between log(CPUE_BC/CPUE_US) and the predictors CPUE_BC and CPUE_US obscures the relationship between  log(CPUE_BC/CPUE_US) and the environmental variable. This is because the variance in the values of the outcome variable log(CPUE_BC/CPUE_US) is almost completely explained by CPUE_BC and CPUE_US and there is very little or no variance remaining to be explained by the environmental variable, whose relationship with log(CPUE_BC/CPUE_US) is of specific interest.

What I don't understand yet is if there is a principled way to estimate the relationship between log(CPUE_BC/CPUE_US) and the environmental variable while guarding against the ill-effects of mathematical coupling.

Any ideas or references you can share that would help me estimate this relationship are greatly appreciated.

Thanks very much,

Isabella

------------------------------
Isabella R. Ghement, Ph.D.
Ghement Statistical Consulting Company Ltd.
E-mail: isabella@ghement.ca
Tel: 604-767-1250
------------------------------
2. RE: Mathematical Coupling - Is it possible to overcome it?

0 Recommend
Eric Siegel
Posted 04-17-2018 08:07
Idea: include Abundance_1 or Abundance_2, but not both, as predictors.

Except for the logarithmic relationship on the left, your model looks a lot like the situation where the model is W = beta0 + beta1*X1 +beta2*X2 + error, but the outcome W = Y minus X1. I remember reading a paper about this situation some 20 years ago. Sadly, I don't remember the paper's author or the journal.

Confidentiality Notice: This e-mail message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message.

Original Message
3. RE: Mathematical Coupling - Is it possible to overcome it?

0 Recommend
Lawrence Lessner
Posted 04-17-2018 08:09
Hi Isabella: An interesting question. May I have a full reference for the article that started you journey. Thank you. Lawrence Lessner

------------------------------
Lawrence Lessner
Research Scientist and Retired
Institute of Health and the Environment, SUNY, Albany
------------------------------

Original Message
4. RE: Mathematical Coupling - Is it possible to overcome it?

0 Recommend
Andrew McDavid
Posted 04-17-2018 09:28
In this situation, I might pause and reconsider the motivation for running the regression as specified. There's a simple model for log(Abundance_1/Abundance_2) as a function of Abundance_1 and Abundance_2 that doesn't require estimating any parameters and explains all of its variance, namely: the observed log-ratio of the Abundance_1 and Abundance_2. So there must be some other motivation for trying to estimate the proposed, mis-specified model that has Abundance_1 and Abundance_2 entering in additively. Perhaps this motivation could be addressed with a different technique altogether, such as generalized additive models or quantile regression.

------------------------------
Andrew McDavid
Biostatistics and Computational Biology
University of Rochester
------------------------------

Original Message
5. RE: Mathematical Coupling - Is it possible to overcome it?

0 Recommend
Edward Cashin
Posted 04-17-2018 09:57
Edited by Edward Cashin 04-17-2018 09:57
Hello. I'm also trying to understand the motivation for including (a different form of) the outcome information on the right hand side. What would be unsatisfactory about the model if you treated the log abundance ratio as depending only on the environment plus noise?

log(Abundance_1/Abundance_2) = beta0 + beta3*Environmental + error

------------------------------
Edward Cashin
Research Scientist II
------------------------------

Original Message
6. RE: Mathematical Coupling - Is it possible to overcome it?

0 Recommend
Ajit Thakur
Posted 04-17-2018 10:15
Unless I misunderstand your model, it seems to me you have at least three problems:

1. Two of your regressors (Abundance_1 and Abundance_2) are not true independent variables to be considered to be regressors.
2. If both Abundance_1 and Abundance_2 are measured variables, then there is errors associated with them.
3. The ratio between the two Abundances is a difficult dependent variable. For example, supposing they both have normally distributed errors, the error structure of the ratio is not so. You will then have to approximate the error by Taylor series expansion.

Simple multiple regression (even with appropriate weightings, if you could find such) does not work with such cases. You will first have to consider regression with errors in both dependent and independent variables. In your case, that will be hard to do. Is there a possible reparameterization of your model? Can you device a dependent variable for your purpose that does not contain any of the three independent variables?

I am afraid I cannot suggest a solution but point out the problems with your model. Maybe some other statistician can. Even in that case, I would be careful about the problems I pointed out in the background.

Ajit K. Thakur, Ph.D.
Retired Statistician

Original Message
7. RE: Mathematical Coupling - Is it possible to overcome it?

0 Recommend
Bill Harris
Posted 04-17-2018 10:37
Hello, Isabella. what about thinking about your project in ODE terms? I'm drawing somewhat on ideas from system dynamics (see Andy Ford's Modeling the Environment, for example).

For example, you might have two stocks of fish (state variables, if you prefer): the number of fish in BC waters and the number of fish in US waters. Speaking simplistically, each has a birth rate, each may have an immigration rate, each has a death rate, each has a catch rate, and each may have an emigration rate. Rate in this case is measured in fish per unit time--perhaps per year, given your problem description.

It's easier to see when drawing a diagram, and you may determine that some of these rates are negligibly small in the case of the fish you're considering. You may also think of additional stocks and rates, but there is a limit to how complex a model you might be able to fit with the data you may have.

It's possible that some of the parameters controlling these rates are common to species, it's possible that others are common to region, and it's possible that some are common to both.

Now the outcome you're seeking to understand is a function of the underlying generative fish population model you've created.

I can think of several ways to model and draw inferences from such a model. Vensim Pro or Vensim DSS would make it easy to create such a model by drawing it and then filling in some of the equations. I know Vensim DSS has the ability to do Powell optimization of such a model, and I think Vensim Pro can, as well. Except for the optimization part, Vensim PLE should be able to model your situation.

You could also consider modeling this in the Stan language (using rstan, if you're an R user). That's probably a bit harder to model, but it lets you do a Bayesian multilevel model in which you can model the parameters in BC and in the US as related but not identical.

GNU MCSim has many of the attributes of Stan. It's sampler isn't as advanced, but I find the language more expressive for describing ODEs.

There's a whole set of R packages in the deSolve family that can do much of this, too, although I don't know if they can do the multilevel modeling and Bayesian inference.

Bill

------------------------------
Bill Harris
Data & Analytics Consultant
Snohomish County PUD
------------------------------

Original Message
8. RE: Mathematical Coupling - Is it possible to overcome it?

0 Recommend
Eric Siegel
Posted 04-18-2018 12:48
What are ODE terms?

Confidentiality Notice: This e-mail message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message.

Original Message
9. RE: Mathematical Coupling - Is it possible to overcome it?

0 Recommend
Bill Harris
Posted 04-18-2018 19:16
Sorry: ODE = ordinary differential equations. I'm proposing more of what some would call a state variable approach to the problem than a static regression analysis.

------------------------------
Bill Harris
Data & Analytics Consultant
Snohomish County PUD
------------------------------

Original Message
10. RE: Mathematical Coupling - Is it possible to overcome it?

0 Recommend
Kieran McCaul
Posted 04-17-2018 16:52
There is another paper by Tu and Gilthorpe that might want to have a look at:
Tu YK and Gilthorpe MS (2007). Revisiting the relation between change and initial value: a review and evaluation. Stat Med 26(2): 443-457.

Mathematical coupling arises when people are looking at change in a variable from baseline to some time,t., while accounting for the baseline level of the variable.
For example, in a RCT of an antihypertensive medication you would have blood pressure at baseline BP0, and at 6 months, BP1, and a treatment indicator TREAT.
Mathematical coupling arises when you regress the change in BP on the baseline value:
BP1-BP0 = BP0 + TREAT.
Apparently people used to do stuff like this (and probably still do).

It also arises in studies of agreement between two different measures. Problems in studies like these led to the paper by Bland and Altman, which was itself strongly influence by the more general work of Oldham.

Bland JM and Altman DG (1986). Statistical methods for assessing agreement between two methods of clinical measurement. Lancet 1(8476): 307-310.
Oldham PD (1962). A note on the analysis of repeated measurements of the same subjects. J Chronic Dis 15: 969-977.

------------------------------
Kieran McCaul
Research Associate Professor
University of Western Australia
------------------------------

Original Message
11. RE: Mathematical Coupling - Is it possible to overcome it?

1 Recommend
Ralph O'Brien
Posted 04-19-2018 12:07
Edited by Ralph O'Brien 04-20-2018 17:18
No matter how you label it or deal with it, a statistical model that is fundamentally flawed needs to be rethought.

Using log(Y1/Y2) = log(Y1) - log(Y2), the model in question is

[1] log(Y1) - log(Y2) = b.0 + b.1*Y1 + b.2*Y2 + b.X*X (+ noise).

Suppose Y1 is logNormal(meanlog=3.5, sdlog=0.4), which gives 0.025, 0.50, and 0.975 quantiles near 15, 33, and 73. Likewise, let Y2 be logNormal(meanlog=3.2, sdlog=0.4) giving the same quantiles of near 11, 24 and 54. These reflect the kind of distributions we often see in practice. Here, even when Y1 and Y2 are independent, a simulation with N = 1 million observations (R code below) reveals that the correlation between log(Y1) - log(Y2) and Y1 - Y2 is about 0.95. Thus, Model [1] is what I call a tautological model: its dependent variable is being predicted by a functional near-twin of itself.

Accordingly, since b.1 and b.2 are wholly interesting in [1], I presume that b.X is the focal parameter, the one that tightly quantifies the essential research question.

Let us consider a common, similar problem.

Suppose we need to compare two independent groups (G=0 vs. 1) with respect to how much Y changes from baseline (Y1) to some specific follow-up time (Y2). If Y is logNormal-like, we could use a model somewhat like [1],

[2] log(Y2/Y1) = log(y2) - log(y1) ~ b.0 + b.1*log(Y1) + b.G*G

However, using base 2 logging eases interpretation: a one unit change in log2(x) is a doubling or halving of x. Thus

[2*] log2(Y2/Y1) = log2(y2) - log2(y1) = b.0 + b.1*log2(Y1) + b.G*G

is functionally identical to [2] but easier (at least for me) to work with. Either one addresses how much the two groups changed over time given that we "adjusted for baseline." b.1 is rather uninteresting. b.G is the focal parameter.

But models [2] and [2*] have the same problem as Model [1]. However, one loses nothing and gains simplicity by using the model

[3] log(y2) ~ b.0 + b.1*log(Y1) + b.G*G

or

[3*] log2(y2) ~ b.0 + b.1*log2(Y1) + b.G*G

Exponentiating Models [3] and [3*] makes them readily interpretable.

[3i] exp(log(y2)) = y2 ~ 2^(b.0 + b.1*log(Y1) + b.G*G)

y2 ~ exp(b.0) * exp(b.1)^log(y1) * exp(b.G)^G

[3*i] 2^(log2(y2)) = y2 ~ 2^(b.0 + b.1*log2(Y1) + b.G*G)

y2 ~ (2^b.0) * (2^b.1)^log2(y1) * (2^b.G)^G

At this point, exp(b.G) in [3i] equals 2^b.G in [3i*]. If 2^b.G = 1.65 in [3i*], then comparing two hypothetical subjects who have the same Y1 value but are in different groups, the Y2 for the G=1 case tends to have a 65% greater (times 1.95) than the Y2 for G=0 case. If 2^b.1 = 1.80 in [3*i], then comparing two subjects in the same group with Subject A having Y1 that is twice Subject B's Y1, A's Y2 tends to be 80% greater (times 1.80) than B's Y2.

Notes. When there is virtually no relationship between the baseline (Y1) and the follow-up score (Y2), the model should only be

[4] log(Y2) ~ b.0 + b.G*G

or

[4*] log2(Y2) ~ b.0 + b.G*G

and NOT the model log(Y1/Y2) ~ b.0 + b.G*G. Also, if the relationship between Y1 and Y2 differs between groups, then you are faced with building a sound interaction model. This is trickier than many people seem to realize. I've witnessed good professional statisticians code the model in a correct way but incorrectly interpret its coefficients.

For much more on this problem, study Senn, S. (2006). Change from baseline and analysis of covariance revisited. Stat Med, 25(24):4334–44.

So how do you apply the principles just covered to handle the "mathematically coupled" model, [1]? To opine on that requires knowing far more about the study's design, its research questions, and its variables than what has been described.

But I might start by applying the modeling corollary to Occam's Razor and George Box's "All models are wrong; some are useful." Ask: Is log2(Y1/Y2) ~ b.0 + X or (often better, when X is logNormal like) log2(Y1/Y2) ~ b.0 + log2(X) useful enough? If log(Y1/Y2) needs to be adjusted for some general magnitude of Y1 and Y2, then one could add meanY = (Y1+Y2)/2 or log2(gmeanY) = log2(sqrt(Y1*Y2)) as the "adjustment" predictor. In the simulation mentioned above, the correlation between log(Y1/Y2) and log((sqrt(Y1*Y2)) was nearly 0.00, but that data generation had no built-in relationship between Y1/Y2 and sqrt(Y1*Y2).

So, could it be that

[5] log(Y1/Y2) = b.0 + b.1* + b.X*log((sqrt(Y1*Y2)) (+ noise)

will work in this application?

Finally, per Voltaire, keep in mind that perfection is the enemy of done.

R code:
mean.logy1 = 3.5
mean.logy2 = 3.2
sd.logy = 0.4

set.seed(170322)
y1 <- rlnorm(1000000,mean.logy1,0.4)
qlnorm(c(0.025,0.50,0.975),mean.logy1,0.4)
# [1] 15.11994 33.11545 72.52894

y2 <- rlnorm(1000000,mean.logy2,0.4)
qlnorm(c(0.025,0.50,0.975),mean.logy2,0.4)
# [1] 11.20113 24.53253 53.73076

cor(y1-y2, log(y1/y2))
# [1] 0.9498388
cor(log(y1/y2),log(sqrt(y1*y2)))
# [1] 0.0009271855

------------------------------
Ralph O'Brien
Professor of Biostatistics (officially retired; still keenly active)
Case Western Reserve University
http://rfuncs.weebly.com/about-ralph-obrien.html
------------------------------

Original Message
12. RE: Mathematical Coupling - Is it possible to overcome it?

1 Recommend
Reinhard Vonthein
Posted 05-30-2018 09:36
Log Abundance ratio is a latent variable, the others are observed in a Structural Equation Model. Do not forget to model the autocorrelation or autoregression parameters between years or some smooth trend, maybe cosinor and linear. This will take you half-way between linear regression and fitting diferential equations.

------------------------------
Reinhard Vonthein
Universitaet Zu Luebeck
------------------------------

Original Message
13. RE: Mathematical Coupling - Is it possible to overcome it?

0 Recommend
Emil Friedman
Posted 04-19-2018 13:37
If Abundance_1 and Abundance_2 are known independent variables, why do you want to use least squares at all? Why not just calculate log(Abundance_1/Abundance_2) and be done with it? There won't be any adjustable parameters or an error term.

------------------------------
Emil M Friedman, PhD
emilfriedman@gmail.com
http://www.statisticalconsulting.org
------------------------------

Original Message
14. RE: Mathematical Coupling - Is it possible to overcome it?

1 Recommend
William Finnoff
Posted 05-31-2018 16:07
Hi,

What you are referring to as "mathematical coupling" is just another term for having a model with "endogenous variables" (as some of the other commentators have indirectly pointed at). Although in current usage this is understood as having any explanatory variable in a model correlated with the error term, it originated from simultaneous equations models which, in many cases, have variables that are the dependent variable of one equation while being simultaneously an explanatory variable in another equation.

The risk in this case (as others have pointed out) is that if one does a simple OLS regression with endogenous variables, estimates and inferences can be biased. The usual methods to deal with this situation is through the use of instrument variables and two stage least squares. For a discussion of this in tutorial form, see:

http://ocw.uc3m.es/economia/econometrics/lecture-notes-1/Topic6_logo.pdf

------------------------------
William Finnoff
------------------------------

Original Message

ASA Connect

Mathematical Coupling - Is it possible to overcome it?

Isabella Ghement04-16-2018 14:00

Eric Siegel04-17-2018 08:07

Lawrence Lessner04-17-2018 08:09

Andrew McDavid04-17-2018 09:28

Edward Cashin04-17-2018 09:57

Ajit Thakur04-17-2018 10:15

Bill Harris04-17-2018 10:37

Eric Siegel04-18-2018 12:48

Bill Harris04-18-2018 19:16

Kieran McCaul04-17-2018 16:52

Ralph O'Brien04-19-2018 12:07

Reinhard Vonthein05-30-2018 09:36

Emil Friedman04-19-2018 13:37

William Finnoff05-31-2018 16:07

1. Mathematical Coupling - Is it possible to overcome it?

2. RE: Mathematical Coupling - Is it possible to overcome it?

3. RE: Mathematical Coupling - Is it possible to overcome it?

4. RE: Mathematical Coupling - Is it possible to overcome it?

5. RE: Mathematical Coupling - Is it possible to overcome it?

6. RE: Mathematical Coupling - Is it possible to overcome it?

7. RE: Mathematical Coupling - Is it possible to overcome it?

8. RE: Mathematical Coupling - Is it possible to overcome it?

9. RE: Mathematical Coupling - Is it possible to overcome it?

10. RE: Mathematical Coupling - Is it possible to overcome it?

11. RE: Mathematical Coupling - Is it possible to overcome it?

12. RE: Mathematical Coupling - Is it possible to overcome it?

13. RE: Mathematical Coupling - Is it possible to overcome it?

14. RE: Mathematical Coupling - Is it possible to overcome it?

Contact Us

Membership

Privacy

Follow Us

ASA Connect

Mathematical Coupling - Is it possible to overcome it?

Isabella Ghement04-16-2018 14:00

Eric Siegel04-17-2018 08:07

Lawrence Lessner04-17-2018 08:09

Andrew McDavid04-17-2018 09:28

Edward Cashin04-17-2018 09:57

Ajit Thakur04-17-2018 10:15

Bill Harris04-17-2018 10:37

Eric Siegel04-18-2018 12:48

Bill Harris04-18-2018 19:16

Kieran McCaul04-17-2018 16:52

Ralph O'Brien04-19-2018 12:07

Reinhard Vonthein05-30-2018 09:36

Emil Friedman04-19-2018 13:37

William Finnoff05-31-2018 16:07

1. Mathematical Coupling - Is it possible to overcome it?

2. RE: Mathematical Coupling - Is it possible to overcome it?

3. RE: Mathematical Coupling - Is it possible to overcome it?

4. RE: Mathematical Coupling - Is it possible to overcome it?

5. RE: Mathematical Coupling - Is it possible to overcome it?

6. RE: Mathematical Coupling - Is it possible to overcome it?

7. RE: Mathematical Coupling - Is it possible to overcome it?

8. RE: Mathematical Coupling - Is it possible to overcome it?

9. RE: Mathematical Coupling - Is it possible to overcome it?

10. RE: Mathematical Coupling - Is it possible to overcome it?

11. RE: Mathematical Coupling - Is it possible to overcome it?

12. RE: Mathematical Coupling - Is it possible to overcome it?

13. RE: Mathematical Coupling - Is it possible to overcome it?

14. RE: Mathematical Coupling - Is it possible to overcome it?

Related Content

Statistician and/or System Dynamics Modeler

Having trouble logging in?

JSM 2018 Roundtable Hosts wanted (Vancouver, BC, Jul 28 - Aug 2)

F value needed to overcome loss in df

Audit Sampling: Construction of Strata Boundaries

Contact Us

Membership

Privacy

Follow Us