Dropping Highly Collinear Variables from a Model: Why it Typically is Not a Good Idea*
Published online on April 14, 2016
Abstract
Objective
To change the common practice of eliminating independent variables from models because they produce multicollinearity in an independent variable of special interest.
Methods
I supplement my presentation, which is based on the purposes of regression analysis, by using Venn diagrams, simple formulas, and two small simulations.
Results
Independent variables that when removed from a model substantially change the statistics associated with the independent variable(s) of most interest are variables that should typically be kept in the model. Multicollinearity is not a sufficient reason to drop variables from a model.
Conclusion
I argue against the routine dropping of variables that cause multicollinearity in an independent variable of interest from regression models. A more important criterion to consider when contemplating dropping a variable from a model is “model influence.”