Workshop
Regression Modeling with Many Correlated Predictors
Jay Magidson, Statistical Innovations
Tony Babinec, AB Analytics
Friday, April 8, 2011
8:30 AM-4:30 PM
Rush University Medical Center
1653 W. Congress Parkway, Chicago, IL 60612
Sponsored by the Chicago Chapter of the American Statistical Assocation
Abstract:
Recent advances in analysis of high dimensional data now allow reliable regression models to be developed even when the number of predictors exceeds the number of cases! In this course we begin by reviewing problems and limitations with traditional linear and logistic regression. Our applications-oriented presentation provides insight into how the new approaches work through examples and by providing an overview of the relevant theory, supplemented by the supporting equations. We use real and simulated data sets to illustrate the different approaches.
COURSE OUTLINE
Linear Regression Model
Bias-Variance Tradeoff
Logistic Regression and Discriminant Analysis
(BREAK)
Classification Tables and ROC Curves
Suppressor Variables
(LUNCH)
Penalized Regression Approaches
- Ridge Regression
- Lasso
- Elastic Net
Component Approaches
- Principal Components Regression
- Partial Least Squares Regression
- Correlated Components Regression – NEW Approach
(BREAK)
Ultra-high Dimensional Data
- Variable Reduction
- Extensions
Who Should Attend
Marketing, biomedical and other researchers who want to improve their understanding of regression model development in the presence of many correlated predictors.
Prerequisites
Familiarity with linear and logistic regression analysis at an applied level.
What you will learn
- How to develop reliable models, even in the presence of extreme multicolinearity and when # predictors >> number of sample observations
- Why many popular variable selection techniques are suboptimal
- About a new powerful step-down variable reduction technique in CORExpressTM
- About free and commercially available software for handing high dimensional data
We will be discussing software in general, and how stepwise regression does not work at all well either in general or in the presence of multicolinearity or when the number of variables exceeds the number of cases. More specifically, Statistical Innovations is developing software currently named CORExpress. This software implements a version of Naïve Bayes regression as well as a new approach called correlated components regression along with various other approaches. We will present the results of simulation studies that include approaches implemented as R contributed packages, including especially GLMNET which implements the penalized approaches such as LASSO and Elastic Net. These latter are associated with Hastie, Tibshirani, and Friedman and their protégés. Workshop attendees will receive a demo version of CORExpress.
Registration Fees
Member $200
Student $50
The Chicago Chapter accepts payment by Visa, Mastercard and American Express.
Register at: http://www.123signup.com/calendar?Org=chicagoasa
Speaker Biographies
Jay Magidson is founder and president of Statistical Innovations Inc., a Boston based consulting, training and software development firm specializing in innovative applications of statistical modeling. His clients have included A.C. Nielsen Co., The Kellogg Company, and Pfizer. He taught statistics at Tufts and Boston University and is widely published on the theory and applications of multivariate statistical methods. Dr. Magidson designed SPSS CHAID, GOLDMineR® and CORExpress™, and is co-developer with Jeroen Vermunt (Tilburg University) of the Latent GOLD® and Latent GOLD® Choice programs.
Tony Babinec teaches statistics and data mining classes for IBM clients. He also presents classes for Statistics.com. He has given talks and workshops at the Joint Statistical Meetings, the Sawtooth Software Conference, the AMA's Advanced Research Techniques Forum, and Statistical Modeling Week. Tony is a past President of the Chicago Chapter of the American Statistical Association and currently is their Workshops VP.