Before constructing any model for prediction purposes it is vital that you do an Exploratory Data Analysis.
This involves understanding the distributions of all your variables and whether there are missing values, outliers and particularly influential points. You should also understand the correlations between your variables and would include an exploratory principal components analysis. This may well lead to the conclusion that are your variables fall into a number of distinct groups. If so, you should ask whether you need to have all of them present in your model.
In the first instance you should try to use subject matter expertise to reduce the number of variables. If you have no subject matter expertise there are a few empirical things you can try. You might need to choose a random sample of the variables to get a model that can be estimated with only 100 observations.
I would do a random forests prediction, as that produces a variable importance score that tells you which variables are the key ones from the point of view of predictive accuracy.
A CART model would also be worth looking at - it will throw away most of the variables and use just a few of them.
The LASSO method also picks out a subset of variables and rejects the rest.
One way or another you need to get the number of variables down to something that can be estimated with 100 observations.
------------------------------
Blaise Egan
Lead Data Scientist
British Telecommunications PLC
Original Message:
Sent: 01-03-2017 14:47
From: Uday Jha
Subject: Predicted R-Squared
I am interested in finding the variables significant for the response and predict its influence. Hence Principal Component Analysis is not going to help in this case.
Thank you
------------------------------
Uday Jha
Rochester Institute of Technology
Original Message:
Sent: 01-03-2017 13:18
From: David Wilson
Subject: Predicted R-Squared
Without knowing what you are trying to accomplish, I suggest investigating the use of principal components.