ASA Connect

 View Only
Expand all | Collapse all

Principal Components Analysis

  • 1.  Principal Components Analysis

    Posted 10-30-2015 09:34

    Everyone,

     

    I'm about to make my first foray into principal component analysis since graduate school.  I'm working on a multivariate regression problem that has multiple independent variables and multiple dependent variables.    I'm wondering if it would be possible to perform multivariate regression by using BOTH the principal components representing dimensions in the set of independent variables as independent variables and the principal components representing dimensions  in the set of dependent variables as dependent variables in the same regression model.  However, I'm unsure if this application is statistically valid. 

     

    The common examples found in texts and in the peer-reviewed literature when PCA is applied prior to multivariate regression are

     

    (1)    multiple independent variables to find the principal components in the independent variables and then one or more components are used as independent variables in a regression against a single dependent variable or

    (2)    to multiple dependent variables to find the principal components in the dependent variables and then a single component is used as a dependent variables in a regression against a more or more independent variables.

     

    However, I would like to know if it is possible to perform one multivariate regression analysis on two sets of principal components, one set of components representing the dimensions in the independent variables and the other set of components representing the dimensions in the dependent variables.   In my rather limited literature search, I haven't found an example of this application in the peer-reviewed literature, which could indicate that it is not statistically valid.  Further, other sources of information don't explicitly address this application. 

     

    Thanks in advance for your insight.

     

    Linda

     

    Linda A. Landon, PhD, ELS

    President

    Research Communiqué

    Jefferson City, MO

    Email:  LandonPhD@ResearchCommunique.com

    Phone: 573-797-4517

    Central Standard Time ( CST ) = GMT-6 (November – February)

    Central Daylight Time ( CDT )  = GMT-5 (March –October)

     



  • 2.  RE: Principal Components Analysis

    Posted 11-02-2015 09:38

    I think canonical correlation is the method you want.

    ------------------------------
    Kenneth Burnham
    Colorado State University



  • 3.  RE: Principal Components Analysis

    Posted 11-03-2015 09:15

    I agree with Ken Burnham, canonical correlation analysis will allow you to answer your research questions. Canonical correlation is very flexible yet underutilized.

    Larry Price

    Texas State University

    ------------------------------
    Larry Price
    Director/Professor - Interdisciplinary Initiative for Research
    Texas State University



  • 4.  RE: Principal Components Analysis

    Posted 11-04-2015 09:09

    Thanks.  My reading based on your suggestion indicates that this is the technique that I should use.

    ------------------------------
    Linda Landon
    President
    Research Communiqué



  • 5.  RE: Principal Components Analysis

    Posted 11-04-2015 09:09

    After educating myself a little about canonical correlation analysis, I agree with you.  Thanks.

    ------------------------------
    Linda Landon
    President
    Research Communiqué



  • 6.  RE: Principal Components Analysis

    Posted 11-02-2015 09:38

    Dear Ms. Landon,

    What you would like to do is called redundancy analysis, it is a special case of canonical correlation analysis. In the latter you look for linear combinations of each set of variables such that the correlation among the linear combinations from both set is high as possible, in the former case the criterion is that the linear combination of the response set can be best predicted by the linear combination of the predictor set. Van den Wollenberg in Psychometrika deals with redundancy analysis. The latter seems to be due to Hotelling. A net search will give all the relevant references.

    ------------------------------
    Pieter Kroonenberg
    Leiden University



  • 7.  RE: Principal Components Analysis

    Posted 11-03-2015 09:15

    I would like to second Pieter Kroonenberg's suggestion. Redundancy analysis is the multivariate version of regression, where you want to regress a set of response variables onto a set of predictor variables. This seems to be what you want? An alternative would be canonical correlation analysis, if you are just interested in the correlations, rather than the regression coefficients.

    Cheers,

    Simon.

    ------------------------------
    Simon Blomberg
    Lecturer and Consultant Statistician
    University of Queensland School of Biological Sciences



  • 8.  RE: Principal Components Analysis

    Posted 11-04-2015 09:10

    Thanks.  You've described a really important distinction between analysis to obtain regression coefficients and analysis to obtain correlation coefficients.  As the statistical analysis plan develops, I will keep your discussion in mind.

    ------------------------------
    Linda Landon
    President
    Research Communiqué



  • 9.  RE: Principal Components Analysis

    Posted 11-03-2015 10:09

    This is a multivariate multiple regression problem.  An approach to this problem that might be useful and insightful is to perform a reduced rank regression.  The book by Reinsel and Velu might be helpful for this approach.

    Thad Tarpey

    ------------------------------
    Thaddeus Tarpey
    Wright State University



  • 10.  RE: Principal Components Analysis

    Posted 11-04-2015 09:10

    Thanks for your suggestion.  I wasn't aware of this form of regression previously.   After a quick reading, it appears to be a form of CCA regression that is applied to econometric time series data.  Is this correct or am I misinterpreting my reading (which was not extensive).

    ------------------------------
    Linda Landon
    President
    Research Communiqué



  • 11.  RE: Principal Components Analysis

    Posted 11-06-2015 09:20
    The growth curve model of Potthoff and Roy is a reduced-rank multivariate linear model and is applicable generally, not just for repeated measures.  So there's reason to think the book Thad Tarpey mentioned by Reinsel and Velu contains material applicable beyond time series and repeated measures regardless of the applications discussed.  But I haven't done much with multivariate growth curve analysis since my PhD and am not familiar with the book.

    Also, if I recall correctly there's a method called partial least-squares regression that's quite different from the homonymous multivariate method mentioned e.g. by Matthew Zack and that may be the one you recall as applying to a single outcome variable.

    Tom
    ____________________
    Thomas M. Davis
    open to opportunities in the Philadelphia area






  • 12.  RE: Principal Components Analysis

    Posted 11-04-2015 09:10

    Thanks for this information on a specific form of CCA.  I will investigate this for my statistical analysis plan.  

    ------------------------------
    Linda Landon
    President
    Research Communiqué



  • 13.  RE: Principal Components Analysis

    Posted 11-02-2015 09:39

    Someone correct me if I'm wrong but I think either partial least squares or canonical correlation is the type of analysis you are looking for.

    ------------------------------
    Jonathan Stallings
    North Carolina State University



  • 14.  RE: Principal Components Analysis

    Posted 11-04-2015 09:19

    Thanks, CCA does appear to the an appropriate form of analysis if my statistical analysis plan focuses on correlation. 

    ------------------------------
    Linda Landon
    President
    Research Communiqué



  • 15.  RE: Principal Components Analysis

    Posted 11-02-2015 09:39

    Although not exactly the same as what you are suggesting, you might look up a similar kind of analysis called partial least squares regression:

     

      https://en.wikipedia.org/wiki/Partial_least_squares_regression

     

     

     

     






  • 16.  RE: Principal Components Analysis

    Posted 11-04-2015 10:06

    Adam and Matthew,   Thanks for your suggestion of partial least squares regression.   I have a vague memory of reading that partial least squares regression assumes multiple predictor variables and a single outcome variable.  This isn't consistent with my situation.  However, thanks for putting PLS regression on my radar. 

    ------------------------------
    Linda Landon
    President
    Research Communiqué



  • 17.  RE: Principal Components Analysis

    Posted 11-02-2015 09:39

    Far, far from an expert on the topic, is Partial Least Squares what you are looking for? I have not worked much on this method, but do vaguely remember that it may have some of the features you are looking for.

    ------------------------------
    Adam James
    Cost Analyst
    Technomics, Inc.



  • 18.  RE: Principal Components Analysis

    Posted 11-02-2015 09:39

    You can use canonical regression analysis (CRA) rather than two sets of PCs. In CRA, components are created from each set of vars and these components are optimal in the sense of maximally correlating.  

    ------------------------------
    Chauncey Dayton



  • 19.  RE: Principal Components Analysis

    Posted 11-04-2015 09:19

    Thanks.  Your response, combined with others' opinions, was really helpful to me.

    ------------------------------
    Linda Landon
    President
    Research Communiqué



  • 20.  RE: Principal Components Analysis

    Posted 11-02-2015 09:42

    While you can do the computations that you suggest, it is not clear what they mean.  PCA requires an appropriate centering of the data.  You need to know the estimated regression model in order to center the dependent variables appropriately.  My suggestion would be to think hard about which dependent variables are most important to include and set aside the others at least as a starting strategy.  This will also help to avoid the nasty problem of interpretation of the PCA of the dependent variables.

    ------------------------------
    Jon Kettenring
    Director, Research Insitutute for Scientists Emeriti (RISE)
    Drew University



  • 21.  RE: Principal Components Analysis

    Posted 11-04-2015 11:12

    Thanks for your feedback.  In my reading on PCA, the recurrent cautions from many authors about how to interpret the results was one of the things that was bothering me about the method.  Fortunately, our colleagues' generous responses to my query have suggested viable alternatives for either correlation analysis or regression analysis.  

    ------------------------------
    Linda Landon
    President
    Research Communiqué



  • 22.  RE: Principal Components Analysis

    Posted 11-02-2015 10:33

    In addition to the suggestions for canonical analysis, you might consider looking at some of the structural equation modeling work -- I'm thinking of Joreskog's Lisrel models but others have developed quite similar alternative approaches, that combine a multiple indicator measurement model with a causal (term used loosely in this context) model.

    ------------------------------
    David Mangen



  • 23.  RE: Principal Components Analysis

    Posted 11-04-2015 11:12

    Thanks for the suggestion.  Structural equation modeling is way out of my statistical experience but it's never to late to learn about it, right?  I'll be doing some reading about it.  Thanks for the literature suggestion to get me started.

    ------------------------------
    Linda Landon
    President
    Research Communiqué



  • 24.  RE: Principal Components Analysis

    Posted 11-02-2015 11:11

    Dear Dr. Landon,

           I would recommend reading up on CANONICAL CORRELATION analysis.  This is essentially a technique that simultaneously explores & tests the relations of a SET of variables (e.g., y1, y2, y3, etc.) versus a SET of variables (e.g., x1, x2, x3, etc.).  For example, check the internet for SAS stats documentation re the CANCORR procedure ("Proc Cancorr"). 

    Joseph J. Locascio, Ph.D. 

    ------------------------------
    Joseph J. Locascio, Ph.D.,
    Assistant Professor of Neurology,
    Harvard Medical School,
    and Statistician,
    Memory and Movement Disorders Units,
    Massachusetts Alzheimer's Disease Research Center,
    Neurology Dept.,
    Massachusetts General Hospital (MGH),
    Boston, Massachusetts 02114
    Phone: (617) 724-7192
    Email: JLocascio@partners.org



  • 25.  RE: Principal Components Analysis

    Posted 11-03-2015 13:37
    There is also also an old paper in SIAM Review on Linear Dependency
    Analysis and a FORTRAN code

    Donald E Myers




  • 26.  RE: Principal Components Analysis

    Posted 11-03-2015 09:15

    I generally stay away from principal components in such situations because of method can miss relevant relationships.  Partial least squares is not uniquely defined when regressing multiple responses on multiple predictors, as there are different algorithms that produce different answers.  Canonical correlation analysis could be useful.  An appropriate choice would seem to depend on the reason(s) for needing to use dimension reduction in the first place.  For instance, additional issues arise if it's needed to compensate for a small sample size.

    ------------------------------
    R. Cook
    University of Minnesota



  • 27.  RE: Principal Components Analysis

    Posted 11-04-2015 09:19

    Thanks for your response.  Your statement "An appropriate choice would seem to depend on the reason(s) for needing to use dimension reduction in the first place." is particularly helpful.  This is a conversation that I need to have with my collaborators to clarify the requirements for the statistical analysis plan.  Good point.

    ------------------------------
    Linda Landon
    President
    Research Communiqué



  • 28.  RE: Principal Components Analysis

    Posted 11-03-2015 17:25

    I don't understand how a method of analysis can be recommended without understanding precisely what the variables are and what is already known about each individually and what their relationships with each other are.

    There are some things that I don't recommend, e.g., principal components without rotation to simple structure because of a general lack of practical interpretation of the unrotated principal components.  Moreover, by default the rotation should be oblique rather than orthogonal, i.e., orthogonal rotation should be used only when oblique rotation reveals nearly orthogonal factors.  It follows that canonical correlation is not recommended because is yields orthogonal uninterpretable factors.

    Dimension reduction may be in order so I accept orthogonally rotated prinicpal components (or maximum likelihood factors if you want to go elegant) followed by oblique rotation for each of the two sets of variables as a possibility, but only after such procedures pass common sense acceptance on the basis of the first paragraph above.

    ------------------------------
    James Frane
    Self-Employed



  • 29.  RE: Principal Components Analysis

    Posted 11-04-2015 09:19

    Thanks for your response to my query.  Your response increased my understanding of the requirements for applying PCA. 

    ------------------------------
    Linda Landon
    President
    Research Communiqué



  • 30.  RE: Principal Components Analysis

    Posted 11-04-2015 11:12

    James Frane raises some good points regarding rotated versus unrotated solutions to Dr. Landon's problem.  I believe there are methods (within SAS & elsewhere) to ROTATE the coefficients from an initial CANONICAL correlation solution.  Might be worth exploring?       

    ------------------------------
    Joseph J. Locascio, Ph.D.,
    Assistant Professor of Neurology,
    Harvard Medical School,
    and Statistician,
    Memory and Movement Disorders Units,
    Massachusetts Alzheimer's Disease Research Center,
    Neurology Dept.,
    Massachusetts General Hospital (MGH),
    Boston, Massachusetts 02114
    Phone: (617) 724-7192
    Email: JLocascio@partners.org



  • 31.  RE: Principal Components Analysis

    Posted 11-04-2015 09:09

    Everyone,

    Thanks to every one of you for taking the time to respond to my query.  Your answers have helped me to clarify my thinking about the current problem and the statistical analysis plan. Canonical correlation analysis as an option had not occurred to me as a data analysis option so thanks, specifically, for that suggestion.  

    More importantly for my general statistical knowledge, because of the varied recommendations and descriptions that all of you provided, I learned important information and gained additional insight about multivariate regression analyses in general.  Thanks.

    I really appreciate all of your inputs.

    Linda

    ------------------------------
    Linda Landon
    President
    Research Communiqué



  • 32.  RE: Principal Components Analysis

    Posted 11-04-2015 09:37

    In my opinion, multivariate partial least squares regression (PLS2) is the method that fits best to your description. It simultaneously performs a dimension reduction in the predictor and predictand blocks in such a way that the covariance between the X and Y components, is maximized.

    Remark - This holds exactly for multivariate SIMPLS regression. As Mr. Cook mentions, there are different algorithms such as NIPALS, which in the multuivariate case, yield slightly different estimates. Yet the multivariate SIMPLS are the exact solution to the optimization criterion and thereby, very clearly defined.

    ------------------------------
    Sven Serneels
    Expert Modeling and Statistics Research
    BASF Corp.



  • 33.  RE: Principal Components Analysis

    Posted 11-04-2015 11:38

    Yes it is possible, even easy, to perform the calculations you suggest. However, note that in the proposed analysis the principle components are linear combinations of either the input variables or the response variables. Furthermore, linear combinations of linear combinations are linear combinations. In effect, performing multivariate multiple regression using principle components will estimate linear combinations of the response variables as constrained linear combinations of the input variables. Interpretation may be an issue.

    ------------------------------
    Gary Fowler



  • 34.  RE: Principal Components Analysis

    Posted 11-05-2015 11:34

    One of the commentators has referred to "principle components."  The proper spelling is "principal components" as in the original question.  While "principal" is sometimes an adjective and sometimes a noun, "principle" is a noun.

    ------------------------------
    James Frane
    Self-Employed



  • 35.  RE: Principal Components Analysis

    Posted 11-06-2015 09:20

    Your point is well-taken and also is pertinent to ASA's focus on improving the ability of statisticians to communicate in writing.  Unless a writer (such as me, in this case!) proofreads obsessively and compulsively, it is overly easy to make this mistake (and other similar mistakes such as capitol versus capital). Words such as these that have minor spelling differences but vastly different uses and meanings are the banes of the English language. I've always suspected that someone with a twisted sense of humor deliberately devised these words.

    Linda

    ------------------------------
    Linda Landon
    President
    Research Communiqué