Hi everyone,
When fitting linear regression models, we can visualize the effects produced by the model using effect plots. These plots essentially show how the predicted values of the response variable Y change with the values of a predictor variable Xj, while holding the values of all other predictor variables (i.e., the non-focal predictor variables) in the model fixed at some conveniently chosen values.
In practice, if the model has many predictor variable, can we use a variation of these effect plots? The variation I had in mind involves considering a specific value for the focal predictor Xj but allowing each of the non-focal predictors to take all values observed in the data. Once the predicted values of Y are computed for that specific value of Xj, we can average them across all values considered for the non-focal predictors and report the average predicted value of Y and a 95% uncertainty interval obtained via bootstrapping.
Would something like this make any sense? The idea is that we would
not have to choose "typical" values for the non-focal predictors, but rather consider all of the values of these predictors observed in the data to
give a more complete picture of what is going on.
Have other people already tried something like this? If it makes any sense, could we describe this type of plots like "average effect plots" or something like that?
Here is a quick R example (minus the bootstrapping) of what I have in mind. In this example, the linear model (lm) relates miles per gallon (mpg) to weight (wt) and number of cylinders (cyl) for a sample of 32 car models. To construct the "average effect plot" for the cyl predictor, I first get the fitted values from the model. Then, I separate those into fitted values corresponding to cyl = 4 and compute their average. The fitted values corresponding to cyl = 6 are also averaged, etc. The resulting average fitted values are:
cyl = 4: 26.6
cyl = 6: 19.7
cyl = 8: 15.1
These three values would be plotted against the cyl values to obtain the "average effect plot". Uncertainty bands can also be added to the plot for each average predicted value displayed.
data(mtcars)
mtcars$cyl <- factor(mtcars$cyl)
m <- lm(mpg ~ wt + cyl, data = mtcars)
cylAvgPred <- function(m){
m$pred <- predict(m)
pred.cyl.4 <- m$pred[mtcars$cyl %in% "4"]
mean.pred.cyl.4 <- mean(pred.cyl.4)
pred.cyl.6 <- m$pred[mtcars$cyl %in% "6"]
mean.pred.cyl.6 <- mean(pred.cyl.6)
pred.cyl.8 <- m$pred[mtcars$cyl %in% "8"]
mean.pred.cyl.8 <- mean(pred.cyl.8)
list(mean.pred.cyl.4 = mean.pred.cyl.4,
mean.pred.cyl.6 = mean.pred.cyl.6,
mean.pred.cyl.8 = mean.pred.cyl.8)
}
cylAvgPred(m)
Many thanks,
Isabella
------------------------------
Isabella R. Ghement, Ph.D.
Ghement Statistical Consulting Company Ltd.
------------------------------