In linear regression, I'm familiar with the concept of categorical variables and the respective dummy variable coding that allows us to fit one level as baseline so as to avoid collinearity. I'm also familiar with how to interpret parameter estimates from such models: The predicted change in the outcome for a given fitted level of the categorical predictor, relative to the baseline category.

What I'm unsure about is how to interpret a **set of independent variables that are proportions that sum to one**. We again have collinearity if we fit all proportions in the model, so presumably we would have to leave one category out as baseline. However, how do we interpret the parameter estimates for those levels fit in the model versus that deemed to be baseline? It seems considering the odds and odds ratio (OR) could work (given the predictors are probabilities), but the exact interpretation isn't clear to me.

**An example**: At the zip code level, the independent variable is the proportion of metamorphic, igneous and sedimentary rocks. As you may know, these are the three major rock types, and all rocks are classified as one of these. As such, the proportions across all three sum to 1. The outcome is the average radon level in a respective zip code. For the sake of this example, assume regression assumptions are met (and, actually, in the real data, they are when the outcome variable has been log-transformed. Log-transforming the predictors was not necessary to help meet assumptions.).

Say I fit *metamorphic* and *igneous* proportions as predictors in the model, leaving the *sedimentary* proportion as baseline. If the parameter estimate for *metamorphic* were, say, 0.43, and that for *igneous* were 0.92, how can I use these to express how the average radon level (outcome) changes for *metamorphic* vs. *sedimentary* (baseline) and for *igneous* vs. *sedimentary*?

My brain keeps wanting to go into the multinomial regression framework, where the *outcome *is a non-ordered grouping variable, and thus use odds and the OR, which compares each level of the outcome back to the baseline level. However, it's not clear to me how to translate this idea to one where the *predictor* is a grouping variable (expressed as a proportion) and the outcome is *continuous*.

What I think would be helpful is if anyone knows what the estimate of 0.43 for *metamorphic* (hypothetical value given above) means with respect to the average radon level (outcome). Can it be interpreted on its own? Or do I somehow compare 0.43 to 0.92 (estimate for *igneous*)? Would it make sense to take the ratio of these two (as an OR)? If so, still, what does this mean *with respect to the outcome* and given the fact that *sedimentary* was fit as baseline?

Thank you for any insight!

Megan J. Olson Hunt, PhD