Hi Georgette,

Fitting a regression by absorbing a categorical covariate is a computational approach/trick ("projection") that results in a fit equivalent to the model that includes dummy variables for all the categories. It is faster but does not return coefficients for the dummy variables, which are not of interest.

Chris

library(estimatr)

set.seed(20220704)

nn <- 1e4

kk <- 300

cc <- factor(sample(kk, nn, replace = TRUE))

xx <- rnorm(nn)

yy <- 1 + 2 * xx + rnorm(nn)

system.time(mod <- lm(yy ~ xx + cc))

head(coef(mod))

head(coef(summary(mod)))

system.time(amod <- estimatr::lm_robust(yy ~ xx, fixed_effects = ~ cc, se_type = "classical"))

coef(amod)

summary(amod)

####

> head(coef(summary(mod)))

Estimate Std. Error t value Pr(>|t|)

(Intercept) 1.2406509 0.17261745 7.187286 7.096729e-13

**xx 2.0226823 0.01012584 199.754575 0.000000e+00**cc2 -0.4540576 0.23477621 -1.934002 5.314198e-02

cc3 -0.4131068 0.25946896 -1.592124 1.113894e-01

cc4 -0.3522683 0.24073263 -1.463317 1.434129e-01

cc5 -0.2662022 0.23477514 -1.133860 2.568812e-01

####

Call:

estimatr::lm_robust(formula = yy ~ xx, fixed_effects = ~cc, se_type = "classical")

Standard error type: classical

Coefficients:

Estimate Std. Error t value Pr(>|t|) CI Lower CI Upper DF

**xx 2.023 0.01013 199.8 0 2.003 2.043 9699**Multiple R-squared: 0.8108 , Adjusted R-squared: 0.805

Multiple R-squared (proj. model): 0.8045 , Adjusted R-squared (proj. model): 0.7984

F-statistic (proj. model): 3.99e+04 on 1 and 9699 DF, p-value: < 2.2e-16

------------------------------

Chris Andrews

Statistician Expert

University of Michigan

------------------------------

Original Message:

Sent: 07-01-2022 09:35

From: Georgette Asherman

Subject: Regression with a 50,000 plus parameters

Economists often use different words than statisticians for the same thing. Is absorption the same as regularization, e.g. LASSO?

Georgette Asherman

------------------------------

Georgette Asherman

Original Message:

Sent: 06-30-2022 07:38

From: Christopher Andrews

Subject: Regression with a 50,000 plus parameters

Econometricians often use absorption to control for a variable with very many (e.g., 50K) categories. E.g. xtreg, areg in stata; the estimatr package in R; or the absorb statement of glm in SAS. Not sure if that is your use case.

------------------------------

Chris Andrews

Statistician Expert

University of Michigan

Original Message:

Sent: 06-29-2022 12:07

From: Terry Meyer

Subject: Regression with a 50,000 plus parameters

Anyone have any ideas on how to do a regression with so many parameters?

------------------------------

Terry Meyer

------------------------------