Discussion: View Thread

RE:Correlation Between X and Y, where each Y has multiple X values

  • 1.  RE:Correlation Between X and Y, where each Y has multiple X values

    Posted 04-22-2011 17:07
    I haven't heard the term in quite a while.  I also found these definitions of "Type III error" by using Google, and it came up with a web site that provided these descriptions:

    Common Expressions: type III error
    Expressions   Definition

    Type III error   In statistical hypothesis testing a Type III error consists of correctly rejecting the null hypothesis, but incorrectly attributing the cause. In other words, the researcher correctly identifies an effect, but incorrectly attributes the cause of the effect. (references)
    Source: compiled by the editor from various references; see credits.


    Specialty Expressions: type III error
    Expressions   Domain   Definition

    Type III error   Statistics   In 1947 F. N. David, perhaps not entirely seriously, suggested that there was a third kind of error which might be committed in testing statistical hypotheses: that of selecting the test falsely to suit the significance of the particular sample data available. A somewhat different type of error of the third kind was suggested by Mosteller (1948) in proposing a non-parametric test for deciding whether one population, out of k populations characterized by a location parameter, has shifted too far to the right of the others. He defines it as 'the error of correctly rejecting the null hypothesis for the wrong reason'.

    I also recall that there are "Type IV errors," the incorrect interpretation of a correctly rejected null hypothesis, as in mis-interpreting interaction effects in factorial analysis of variance results.

    -------------------------------------------
    Milton Goldsamt
    Survey Statistician
    -------------------------------------------


  • 2.  RE:Correlation Between X and Y, where each Y has multiple X values

    Posted 04-22-2011 17:18

    I wasn't familiar with a lot of the history of type III and type IV errors  Maybe a type III error ought to be applying type III sum of squares in ANOVA when type I is appropriate.  I certainly did not create the term.  I think I first heard in a consulting context somewhat like I mentioned here where you provide a really nice solution to the wrong problem.  These definitions you cite are interesting but actually seem too serious.  I always associate this term with something humorous although there is nothing funny about making such a mistake with a client.
    -------------------------------------------
    Michael Chernick
    Director of Biostatistical Services
    Lankenau Institute for Medical Research
    -------------------------------------------








  • 3.  RE:Correlation Between X and Y, where each Y has multiple X values

    Posted 04-22-2011 17:32
    You all may wish to hear the definition of a triple-blind trial. In a single blind trial, of course, the patient does not know what he is getting. In a double-blind trial, the patient does not know what he is getting and the doctor does not know what he is giving. In a triple-blind trial the patient does not know what he is getting, the doctor does not know what he is giving and the statistician does not know what he is doing. 

    Best wishes,

    Nayak



    -------------------------------------------
    Nayak Polissar
    Consultant
    The Mountain Whisper Light
    -------------------------------------------








  • 4.  RE:Correlation Between X and Y, where each Y has multiple X values

    Posted 04-25-2011 13:14
    I was told (tongue-in-cheek) that a double blind study is two orthopedists trying to interpret an EKG.  BTW, an orthopedist told me this.

    -------------------------------------------
    Richard Browne
    Texas Scottish Rite Hospital for Children
    -------------------------------------------








  • 5.  RE:Correlation Between X and Y, where each Y has multiple X values

    Posted 04-25-2011 12:47
    The usually accepted definition I have heard over the years is that from Edwards in JASA 1955, if I remember correctly (I recall it was when I was Asst Prof Math at Va Tech--it's been a while), titled something like "The Third Type of Error".
    Incidentally, on the topic, the important first step in consulting is to make the client express explicitly and unequivocally exactly what he want to find out; goals lead to design, goals plus design lead to data, and goals plus design plus data lead to analysis.
    -------------------------------------------
    Bob Riffenburgh
    Naval Medical Center
    -------------------------------------------








  • 6.  RE:Correlation Between X and Y, where each Y has multiple X values

    Posted 04-25-2011 12:50

    I appreciate Dr. Chernick's response, and he raises some good points.

     

    One of the harder tasks of a consultant, I have found, is explaining to clients why their research question is not nearly as specific as they believe it to be. For this particular situation, the client contacted me after the study had been done, and after the first analysis found no siginficant results. The original research question:

     

    The purpose of this study was examine if there is a relationship between a nurse's satisfaction with work enviroment and a patient's perception of caring. Registered nurses were recruited from the hospital (n = 20). Each completed the Health Environment Survey (HES). Within two months of the nurse completing the HES, 10 patients that has received primary care from the nurse were asked to complete the Caring Factors Survey (CFS). Nurse and patient data were paired to create 200 unique dyads.

     

    For the analysis, the client replicated each of the nurse's scores 10 times so that there were 200 pairs of observations (nurse/patient dyad). There was no significant correlation. Had the client come to me before the data were collected, or even before she went to the hospital's IRB, I would have suggested a different design. That's not an option now.

     

    The question the client posed to me, after not finding a significant result, was whether a hierarchical analysis should be considered because of the nested property of the patients within the nurses.

     

    Although Bev was the consultant who wrote up this question for the group, the client is mine and the information she gave was more or less my words. Please do not fault her for not giving a well-posed problem. I believe the question we would appreciate your opinions on is whether a hierarchical analysis is warranted here, and if it is, how we might best go about it.
    DeAnne French
    -------------------------------------------
    Beverly Grunden
    Statistical Consultant
    Wright State University
    -------------------------------------------








  • 7.  RE:Correlation Between X and Y, where each Y has multiple X values

    Posted 04-25-2011 13:29
    I want to say that I was not criticizing Beverly at all for the way she posed the question.  She specifically encouraged us to ask clarifying questions.  My only critique was the way a number of us jumped into to provide answers without asking any questions when questions were warranted.  I think this description gives better detail and background.  I was more interested in the way we approached the problems consultants rather than how you should best address the client.  I always get worried when a client says we analyzed it this way and didn't get a significant result.  Is there a better way to analyze it?  My worry is not that the appropriate answer to the question might be yes but rather that had the result been significant would the client have pursued a different form of analysis?  Might this be a case where the client wants a significant result and is willing to look at the data upside down and backwards until one is found.  I find that to be another one of the hazards of consulting.  The research is so interested in a significant result that they test the data several times and in several different ways until they get a significant result.  Unfortunately as we know without adjustment for multiplicity a significant raw p-value is likely to pop-up.  It is part of the job of the consulting statistician to insist on multiplicity adjustment or at least provide fair warning of the danger of an erroneous conclusion.  Most of all inspite of the temptation to publish I wouldn't put my name on a paper where important consulting advice of mine was ignored.

    -------------------------------------------
    Michael Chernick
    Director of Biostatistical Services
    Lankenau Institute for Medical Research
    -------------------------------------------








  • 8.  RE:Correlation Between X and Y, where each Y has multiple X values

    Posted 04-25-2011 13:44
    What's worse is that statisticians and others who should know better are only interested in producing statistically significant results, no matter what the data say. Part of this is due to publication bias and part is due to incompetence. Perhaps we need a Journal of Insignificant Results for important findings of insignificance.

    -------------------------------------------
    Chuck Coleman
    -------------------------------------------








  • 9.  RE:Correlation Between X and Y, where each Y has multiple X values

    Posted 04-25-2011 13:50
    I hope that not too many statisticians fall into the trap you mention.  I think most of us know better but some people do get intimidated in the work environment.  I think ASA should protect them better and perhaps statistical ethics as defined and hopefully emphasized by the ASA could help against such intimidation.

    -------------------------------------------
    Michael Chernick
    Director of Biostatistical Services
    Lankenau Institute for Medical Research
    -------------------------------------------








  • 10.  RE:Correlation Between X and Y, where each Y has multiple X values

    Posted 04-25-2011 14:05
    I was speaking from second-hand experience of a statistical consultant on a thesis who was of the view that only significant results counted and should be extracted from the data. My sister, a psychologist by training, took over the job and told the client that insignificance is perfectly acceptable. In fact, her dissertation was insignificant. She is unusually good at statistics for a psychologist. It helped that she had some unusually good teachers in college.

    As for "tortur[ing] the data until Nature confesses" (Ronald Coase), it seems to occur mostly in social sciences by researchers trying to increase their publication counts. I've also seen replication studies that showed that the results were incorrect or that some data never existed (e.g., published before their first year of publication.) This is the reason that many journals now require raw data. These researchers tend not to be members of the ASA. An initiative to reach out to them and their journals seems in order.

    To make things even more fun, I worked on a legal case as a statistical number cruncher. My employer was aghast at the statistics he had to review: the logs were full of error messages and the like.

    -------------------------------------------
    Chuck Coleman
    -------------------------------------------








  • 11.  RE:Correlation Between X and Y, where each Y has multiple X values

    Posted 04-25-2011 14:21
    A bug-a-boo with meta-analyses is publication bias (the fact that only significant results get reported.  The FDA requires that the results of all trial under their regulation have their results reported.  Many people are very aware of publication bias and are trying to eliminate it.  Sometimes an insignificant result is interesting and should be reported.  Other time it is not interesting but should be reported just so the work don't get duplicated because other didn't know someone had shown it to be insignificant.  Also as I think may have been mentioned briefly often in clinical trials equivalence or non-inferiority is the standard the drug must stand up to.  In such cases insignificant differences are good and large differences may be bad.  However in the Neyman-Pearson framework of hpyothesis testing this means switching the null and alternative hypothesis which is something your client may not realize.

    -------------------------------------------
    Michael Chernick
    Director of Biostatistical Services
    Lankenau Institute for Medical Research
    -------------------------------------------








  • 12.  RE:Correlation Between X and Y, where each Y has multiple X values

    Posted 04-25-2011 13:55
    And, it often depends on the framework.  In device trials and generic drug trials we are often specifically looking for non-significance.  I'm with you, the data is the data.  My approach has always been to have an up-front discussion with the client in which I specific state; "If you are not willing to be wrong in your assumptions about the outcomes [and every client has these assumptions] then I can't help."  As a consultant it is important to know when to walk away.

    -------------------------------------------
    William Grant
    Research Associate Professor
    SUNY Upstate Medical University
    -------------------------------------------








  • 13.  RE:Correlation Between X and Y, where each Y has multiple X values

    Posted 04-25-2011 14:13

    I find this discussion group plays a role in reminding many of us that it is important to listen to the client and not assume anything about the problem at hand, and keep asking questions until we reach agreement about their objectives and their statistical needs, and the suitability of our proposed solution.

    Consuelo
    -------------------------------------------
    Consuelo Arellano
    North Carolina State University
    -------------------------------------------








  • 14.  RE:Correlation Between X and Y, where each Y has multiple X values

    Posted 04-25-2011 14:33
    I think the discussion has taken on a new spark of interest as we move away from the specific consulting problem and into the question of what constitutes good and bad consulting, an issue that should be of paramount importance to this section.  But for a moment I would like to go back to Beverly's problem.  In today's discussion we learned that the client actually replicated the nurses score a sufficient number of times so that he could create paired observations for every patient.  While I do not know for sure what the best way to analyze this data is one thing as a consultant I would tell the client immediately is that what they did is not a valid way to analyze the data.  This is because the validity of the estimate depends on the pairs being independent (or at least uncorrelated within the components). But we have violated that by repeating a single value so many times.  We implicitly treat the ten bivariate observations for a given nurse as independent when in fact there is no variability in the nurses component.  So we have created a potiently highly bias estimate of correlation.

    -------------------------------------------
    Michael Chernick
    Director of Biostatistical Services
    Lankenau Institute for Medical Research
    -------------------------------------------








  • 15.  RE:Correlation Between X and Y, where each Y has multiple X values

    Posted 04-25-2011 14:24


    -------------------------------------------
    Joseph Nowakowski
    Professor of Economics
    Muskingum University
    -------------------------------------------
    Too late: <http://nulljournal.wordpress.com/>. 

    I agree that, at least in economics, significance and publication go hand in hand.  It may be tied to the same disciplinary cultural that make our conference sessions so brutal:  we tend to overestimate the opportunity cost of reading the paper or sitting through the session without a significant payoff. But I also wonder if the tendency to "tweak" the data until one gets a results one likes is due to the political (i.e., polarizing) nature of many of the questions addressed by economists and other social scientists.







  • 16.  RE:Correlation Between X and Y, where each Y has multiple X values

    Posted 04-25-2011 14:36
    don't think for one moment that these problems don't arise in medicine also!

    -------------------------------------------
    Michael Chernick
    Director of Biostatistical Services
    Lankenau Institute for Medical Research
    -------------------------------------------








  • 17.  RE:Correlation Between X and Y, where each Y has multiple X values

    Posted 04-25-2011 14:45
    After all this interesting discussion about well-posed problems, specifying research questions, etc.  I am still curious about how one would one find a measure of the strength of the relationship (correlation??) between variables X and Y when there are multiple Xs for each Y?

    -------------------------------------------
    Beverly Grunden
    Statistical Consultant
    Wright State University
    -------------------------------------------








  • 18.  RE:Correlation Between X and Y, where each Y has multiple X values

    Posted 04-25-2011 15:09

    Would a regression Beta work for your purposes? Or does it have to be a correlation coefficient?
    -------------------------------------------
    Eric Siegel
    Boistatistician
    Univ of Arkansas for Medical Sciences
    -------------------------------------------








  • 19.  RE:Correlation Between X and Y, where each Y has multiple X values

    Posted 04-25-2011 16:55
    Normally when I try to link employee satisfaction and customer satisfaction there are multiple employee scores and customer scores. Scores are either produced over time or at once but with multiple people.

    I am not sure correlation would work because your y value remains the same for several values.

    Using a regression analysis will look at the y score with each x score but each score is now being treated as a separate variable unless you repeat the y values but if all the nurses are rating top two boxes, there won't be much of a change with y therefore you won't see any relationship with the x values.

    I wouldn't recommend repeating the y scores.........you will need more data.
    -------------------------------------------
    Kenita Hall
    -------------------------------------------








  • 20.  RE:Correlation Between X and Y, where each Y has multiple X values

    Posted 04-25-2011 17:06
    Well put.  I was saying the same sort of thing but perhaps in too abstract a manner for some to understanding.  Repeating the y values 10 times and doing the regression analysis is like saying that the repeated values are independent which is absurd.  It doesn't change the estimate of the mean and variance but it lead you to believe these estimates are more accurate because instead of having a sample of N nurses you now fool yourself into thinki8ng that you have 10N nurses.  So it looks like you have improved your estimate of the mean by a factor of square root of 10 when in fact you have added no new information and have not improved it at all!

    -------------------------------------------
    Michael Chernick
    Director of Biostatistical Services
    Lankenau Institute for Medical Research
    -------------------------------------------








  • 21.  RE:Correlation Between X and Y, where each Y has multiple X values

    Posted 04-25-2011 17:36
    On the other hand, it would be perfectly acceptable to repeat the Y scores if we subsequently modelled the data as 20 clusters of size=10 instead of as 200 independent data points.

    -------------------------------------------
    Eric Siegel
    Boistatistician
    Univ of Arkansas for Medical Sciences
    -------------------------------------------








  • 22.  RE:Correlation Between X and Y, where each Y has multiple X values

    Posted 04-25-2011 18:28

    That sounds a lot better.
    -------------------------------------------
    Michael Chernick
    Director of Biostatistical Services
    Lankenau Institute for Medical Research
    -------------------------------------------








  • 23.  RE:Correlation Between X and Y, where each Y has multiple X values

    Posted 04-25-2011 17:05
    Beverly:

      From you short description, I suspect that your experimental trial set the value for Y and then measured several X responses. If so, then you have just reversed the variable definition within a typical regression problem (ie. Y is the independent variable and X the dependent one).  I also would say that you don't really have multiple X values for each Y. What you have are many paired X-Y values, its just that many of the Y's take on the same pre-set value.

    Now the strength of the relationship (correlation) between X and Y depends upon the model that you try to fit to the data; higher correlations being associated with a model that is closer to the true model.  For example, if the underlying (unknown) relationship were to be purely quadratic (Y=X^2) and you tried to fit a linear one:

     Y = beta_o + beta_1 X

    you would probably find very little correlation - the two variables are not linearly related.  Without knowing more details of the problem, I think it probably would be useful to go through the traditional model building exercise (probably with Y the dependent variable) to see what, if any, relationship exists.  This exercise could be either a "constructional" (sequentially adding terms to a simple model) or "destructional" one (sequentially removing least significant terms from a very complex model).

    Hope this helps.
     
    -------------------------------------------
    Jeffrey Proehl
    -------------------------------------------








  • 24.  RE:Correlation Between X and Y, where each Y has multiple X values

    Posted 04-26-2011 11:44
    I agree with Michael Chernick that the problem is in modeling the errors.  First, can we assume that the patients were chosen randomly for each nurse? If so, then can the two hundred observations be considered as independent observations given that only twenty nurses were involved?  Since we are looking for a relationship between the self evaluations of the nurses and the evaluations of the patients, we do not want the patients to be independent of the nurses, that is, we want the resposes of the patients to be correlated with the nurses.  If the responses are iid for each nurse, and we assume the same variance for each nurse, I think the responses could be assumed iid.  The test I gave early on in this discussion, from Searle, tests just that assumption, I believe.  If the Searle test is not significant, then the simple regression (or correlation), with the nurses' values replicated, would be appropriate.  If the Searle test is significant (probably too small) then the error within the 10 observations is probably not iid.  If the errors are not iid but one can assume that the error structures for all of the nurses are the same, then using the means to test for a relationship would be appropriate, since the variances for all of the means are then all the same.  Otherwise, a regression on the means, wieghting with the inverses of the square roots of the estimated variances might be appropriate (that is, a generalized linear regression).

    Checking the variation for each nurse with a plot (such as a boxplot) and doing a scatter plot of the means to see if it looks like there is a relationship between the nurse and patient evaluations, and what type of relationship the relationship is if there is one, are both good things to do.  We were always taught to look at the data first. 


    Margot Tollefson
    Owner
    Vanward Statistical Consulting
    -------------------------------------------








  • 25.  RE:Correlation Between X and Y, where each Y has multiple X values

    Posted 04-26-2011 12:07
    I don't completely understand what test you are referring to in Searle.  I have the book.  So if you cite the page number(s) I can check this for myself.  I don't have a problem with the point estimate of correlation between nurses and patients but if we do the standard analysis and consider the standard error for the correlation coefficient or a 95% confidence interval these estimates will be too small and the confidence interval too narrow.

    Say you have twenty nurses and ten patients per nurse then you have generated 200 pairs to estimate the correlation but the effective sample size is really still only 20.  However if you use a canned package the computer is not going to know the nurses are repeated 10 times and it will assume it is using 200 independent correlated pairs to estimate the correlation.

    -------------------------------------------
    Michael Chernick
    Director of Biostatistical Services
    Lankenau Institute for Medical Research
    -------------------------------------------








  • 26.  RE:Correlation Between X and Y, where each Y has multiple X values

    Posted 04-26-2011 12:47

    Beverly referred me to page 103 in Searle "Linear Models" 1st edition 1971.  I read the page which discusses pure error.  The context is multiple regression where some of the predictor variables have their observed values repeated.  If we think of the nurses as one of the predictor variables in Beverly's scenario their one value would be repeated 10 times.  But what represents the other predictor variable and what is the response.  Are we carrying this over to the simple linear regression case where the y would have to be the patients?   Even if this is the case where is the connection to the correlation coefficient estimate.  I believe what Searle is doing is testing the adequacy of the multiple regression model.  I don't see how accepting or rejecting the adequacy of the model relates to whether or not it is legitimate to estimate bivariate correlation using the nurses repeated 10 times.  Can you shed some light on this Margot?
    -------------------------------------------
    Michael Chernick
    Director of Biostatistical Services
    Lankenau Institute for Medical Research
    -------------------------------------------








  • 27.  RE:Correlation Between X and Y, where each Y has multiple X values

    Posted 04-26-2011 13:29

    Michael,

    I have enjoyed your comments so far.

    Searle demostrates the model for simple regression.  If the model were applied to Beverly's data, then, yes, the predictors would be the nurses' self-evaluations and the Y's would become the patients' evaluations.  I understand your concern that the trials are not 200 independent trials, using 200 nurses, which happen to have groups of nurses whose self-evaluations are the same.  I am concerned about it, too.  That is why I thought the Searle test could test the fit of the model.  I suspect Searles' model fits repeated measurements for each predictor too.

    For simple regression, beta is the correlation multiplied by the estimated standard deviation of the X's divided by the estimated standard deviation the Y's.  I am quite sure beta and r have the same distributional properties with respect to significance.

    Actually, from my survey sampling background, if the patients were chosen by simple random sampling without replacement for each nurse, there would be dependence between observations by nurse just from the sampling, so using the means might be better. 

     

    -------------------------------------------
    Margot Tollefson
    Owner
    Vanward Statistical Consulting
    -------------------------------------------








  • 28.  RE:Correlation Between X and Y, where each Y has multiple X values

    Posted 04-26-2011 13:53
    Margot:  Thanks for the explanation. I think I see what you are saying.  Testing beta = 0 is equivalent to test r=0 .  So rejecting the null hypothesis both tell you that the nurse's satisfaction score is to some extent related to patient's score.  So nurse and patient have a correlation significantly different from zero.

    In simple linear regression both the population and sample estimates of the regression slope parameter are simply related to the correlation coefficient.  Shown here for the sample estimates

    b=r(Sy/Sx) where b is the least squares estimate of slope and r is the sample Peason product moment correlation, Sx is the sample estimate for the standard deviation for X and Sy is the sample estimate for the standard deviation of Y.  This can probably be found in hundreds of regression books including probably every edition of Draper and Smith but I prefer to cite Chernick and Friis (2003) "Introductory Biostatistics for the Health Sciences: Modern Applications Including Bootstrap", Wiley, Hoboken, page 261.

    -------------------------------------------
    Michael Chernick
    Director of Biostatistical Services
    Lankenau Institute for Medical Research
    -------------------------------------------








  • 29.  RE:Correlation Between X and Y, where each Y has multiple X values

    Posted 04-26-2011 14:02

    Do you think it could be possible for those of you in this discussion to take your thoughts offline from now on ?  This discussion has been going on for a few days and the same people are participating.  There is no reason to include the entire Statistical Consulting Section in your discussions.


    -------------------------------------------
    Brian Taylor
    Operations Research Analyst
    Army Test & Evaluation Command
    -------------------------------------------








  • 30.  RE:Correlation Between X and Y, where each Y has multiple X values

    Posted 04-26-2011 14:16
    I have been following this discussion with some interest.  I completely agree with all the comments regarding the importance of good communication in consulting.  I have also found the wide variety of statistical solutions posed for the problem at hand interesting.  I agree with Michael Chernick that it is important to not make the mistake of "pseudo-replication" (as coined by Hulbert) by ignoring the correlation of responses within nurses (or the lack of independence). 

    I also agree that there are similarities in regression and correlation.  However, not all correlation problems can be framed as regression problems.  An important distinction between the two is whether the independent or predictor variable is measured with error.  An assumption of linear regression is that the independent variable or predictor is measured without error or is set by design.  I think that treating the nurses evaluations (which are definitely measured with error) as the predictors will give you a larger bias than taking the mean of the patients' evaluations and treating that as the predictor (since the mean of 10 observations will have smaller error).  A hierarchical general linear model could handle the complexity of this problem and deal with the errors in variables problem.  I addressed this problem (in the ecology context) in "Comparative methods based on species mean values" Mathematical Biosciences 2004.

    -------------------------------------------
    Colleen Kelly
    Principal Consultant
    Kelly Statistical Consulting
    -------------------------------------------








  • 31.  RE:Correlation Between X and Y, where each Y has multiple X values

    Posted 04-26-2011 14:41
    Yes regression either means that the predictors are measured without error because the error term in the model is independent noise that is independent of other noise components and independent of the predictors.  Now there is an error in variables model that can handle regression estimation when the predictor variables are observed with noise.  It does require knowing the ratio of the error variance in X to the error variance of the pure noise term.  Then the best estimate of the slope of the regression line minimizes the error sum of squares in a different direction than OLS which measure error in the veritical direction implicitly assuming no error in variables.

    -------------------------------------------
    Michael Chernick
    Director of Biostatistical Services
    Lankenau Institute for Medical Research
    -------------------------------------------








  • 32.  RE:Correlation Between X and Y, where each Y has multiple X values

    Posted 04-26-2011 14:40


    -------------------------------------------
    Eric Siegel
    Boistatistician
    Univ of Arkansas for Medical Sciences
    -------------------------------------------



    b=r(Sy/Sx): is the point estimate of b still valid when the observations are non-independent?  (I know the standard error of b won't be valid, but if the point estimate of b still is, that's half the battle.) 




  • 33.  RE:Correlation Between X and Y, where each Y has multiple X values

    Posted 04-26-2011 14:54
    I haven't thought this through.  But if there should be an error in variables term added to the model the least squares estimate of slope is biased and the variance term in X is missing a component for eror in the measurement of the variable.

    -------------------------------------------
    Michael Chernick
    Director of Biostatistical Services
    Lankenau Institute for Medical Research
    -------------------------------------------








  • 34.  RE:Correlation Between X and Y, where each Y has multiple X values

    Posted 04-26-2011 15:35
    What if we took the well-known relationship between the correlation coefficient r and the test statistic t with DF=n-2 (I hope the math objects transfer),

     

    and we generalized it by plugging in the regression slope's t-statistic and DF from a mixed-models regression with the X's as the responses, the Y's as predictor, and Nurse as the random effect to model the clustering among the X's? We could always call it a "generalized correlation" or "fixed-effects correlation"....

    As to whether normality is required, it's worth noting that the SAS Corr Procedure uses the above well-known relationship to calculate a p-value for r whether the r is a Pearson's r or a Spearman's r.

    -------------------------------------------
    Eric Siegel
    Boistatistician
    Univ of Arkansas for Medical Sciences
    -------------------------------------------








  • 35.  RE:Correlation Between X and Y, where each Y has multiple X values

    Posted 04-26-2011 15:52
    If you see red X's instead of math objects, right-click on them and choose Download Pictures.

    -------------------------------------------
    Eric Siegel
    Boistatistician
    Univ of Arkansas for Medical Sciences
    -------------------------------------------






  • 36.  RE:Correlation Between X and Y, where each Y has multiple X values

    Posted 04-26-2011 19:10


    -------------------------------------------
    Patrick Spagon
    -------------------------------------------
    I haven't seen all of this, but the first thing that struck me is why isn't this a repeated measures study which we would analyze using GLM?







  • 37.  RE:Correlation Between X and Y, where each Y has multiple X values

    Posted 04-26-2011 20:09

    The repeated measures are not real. It is a construction that repeats the exact value for each nurse to create a fictitious pair of nurse with each of their 10 patents. By repeated measures I am sure you are thinking about independent response by the nurse like measuring the same variable at different times. 
     

    -------------------------------------------
    Michael Chernick
    Director of Biostatistical Services
    Lankenau Institute for Medical Research
    -------------------------------------------








  • 38.  RE:Correlation Between X and Y, where each Y has multiple X values

    Posted 04-26-2011 20:28


    -------------------------------------------
    Russell Reddoch
    -------------------------------------------
    Alas the Analysts Curse:
           If you get an answer that agrees with world view, and seems to make sense, one does not normally expend much time and effort trying to locate where things might have been done wrong.
           Of course, if the answer seems wrong, one does go to great trouble to find what might have been done wrong.
           Once again this tread is demonstraiting same







  • 39.  RE:Correlation Between X and Y, where each Y has multiple X values

    Posted 04-25-2011 17:07
    ------------------------------------------- Jon Baskerville ------------------------------------------- I think I would like to forget about a one number estimate of correlation and first look at the variabiliy of patient satisfaction (X) for each practitioner (Y). If the values of X are fairly constant for each practioner then taking the average X within paractiioners would not be discarding much information and a correlation between those means and the Ys woulid seem appropriate. However if there is considerable variability in patient response to the same practitioner this might tell us that what we are measuring here depends more on who the patient is (or their condition) than what the practioner has done. This may then call for more consideration of the instruments used or the design of the study. This may seem more exploratory than confirmatory - but are we not in this to find the truth? Jon Baskerville Retired Biostatistician and P-Stat (Accredited Professional Statistician - Statistical Society of Canada)


  • 40.  RE:Correlation Between X and Y, where each Y has multiple X values

    Posted 04-25-2011 17:10
    In this regard, you might want to look at the intraclass correlation: it is the proportion of variance in X attributable to the nurse.  You can get it easily in any standard software package.

    -------------------------------------------
    Clyde Schechter
    -------------------------------------------








  • 41.  RE:Correlation Between X and Y, where each Y has multiple X values

    Posted 04-25-2011 14:24

    Hi Beverly:

    Thanks for the clarification of how you inherited this problem.  I  "Goggled" (a new verb) "Caring Factors Survey" and found very little regarding how the material is handled.  If there are different domains, this would be of interest - if some domains more "sensitive" than others, it would be of interest to look at the results by domain as well as overall.  The only "Health Environment Survey" I could find (and it was given to nurses) - was literally about enviromental issues (talking to patienst about tobacco, pesticides, cosmetics etc.).  It maybe that I have missed the ones used - the search was rather cursory.

    So the question for me goes all the way back to the research instruments and what they were designed to do.  If the client has copies of these and also a reference to how they were developed/used, this can be a first step in trying to reflect back the research question. 

    Good luck on this.  It is challenging to stick to good statistical principles without loosing the client's interest - they need to be keen to work through the process, and often they get frustrated without knowing the bigger picture.
    -------------------------------------------
    Janet McDougall
    President
    McDougall Scientific Ltd
    -------------------------------------------








  • 42.  RE:Correlation Between X and Y, where each Y has multiple X values

    Posted 04-25-2011 13:48
    " the important first step in consulting is to make the client express explicitly and unequivocally exactly what he want to find out; goals lead to design, goals plus design lead to data, and goals plus design plus data lead to analysis."  This is very close to what I was saying.

    I would change "to make" to say "to help".  Turning tacit knowledge to explicit knowledge is not as easy as it seems.   After all, a client has one discipline specific culture/dialect background, and the consultant is from another background usually one of the methodological/statistical cultures.
    " the important first step in consulting is to help the clients express as explicitly and unequivocally as possible exactly what they want to find out; goals lead to design, goals plus design lead to data, and goals plus design plus data lead to analysis.

    In my experience, it is important to have all of the key players involved in the discussion of the goals and that all of them understand the approach taken and what kinds of statements might be possible on the basis of the work.  Very often the person who will make the decision sends a very junior member of the team to be an intermediary. Watch out for this.  Also, it is often helpful to have the research team agree on  drafts of "A Memo of Understanding" from the meetings with the consultant.

    -------------------------------------------
    Arthur Kendall
    Social Research Consultants
    -------------------------------------------








  • 43.  RE:Correlation Between X and Y, where each Y has multiple X values

    Posted 04-25-2011 13:59
    Very true and very good suggestions. I particularly want to emphasize Dr.  Kendall's comment about culture.  I think an important aspect of this is jargon.  A lot of the time the statistician uses jargon unfamiliar to the client and particular in the medical field when the client is a physician the physician uses even more jargon that the statistician doesn't uderstand.  Often times we are embarrassed to admit that we don't know the meaning of a word.  So rather than ask we guess.  Same is true for the client.  With all this miscompunication no wonder the error of the third kind occurs!  I try to avoid jargon and when it is difficult not to use a statistical term I need to make sure the client understands it.

    I think that when a good statistician is a poor consultant it is invariably because he or she overuses jargon.

    -------------------------------------------
    Michael Chernick
    Director of Biostatistical Services
    Lankenau Institute for Medical Research
    -------------------------------------------








  • 44.  RE:Correlation Between X and Y, where each Y has multiple X values

    Posted 04-25-2011 21:54
    As a corollary to Art's very solid points, I have found that offering examples of what a final report might contact as results statements if certain analytic approaches were used helps the client. Very frankly, some clients cannot crystallize what they're apt to get until they see an example of it because they think in more concrete terms and certainly in a different vernacular too. As consulting statisticians, we have to have excellent (or better) "listening skills" and help the client "talk out" what they'd really like to obtain from the study. 

    I even have heard of Delphi methods being used (in evaluation research) to refine and clarify goal statements from different stakeholders because they hold such different views of what the results should provide. As such, the sampling design, analyses, etc. can't provide sufficient power to meet all the diverse goals. So there may be multiple clients to deal with.

    -------------------------------------------
    Milton Goldsamt
    Survey Statistician
    -------------------------------------------








  • 45.  RE:Correlation Between X and Y, where each Y has multiple X values

    Posted 04-26-2011 11:09


    -------------------------------------------
    Richard Browne
    Texas Scottish Rite Hospital for Children
    -------------------------------------------
    I have also found that the first round of analyses is what helps the principal investigator finally realize what they want.  I would encourage new consultants to factor in (at least) a second pass or analysis of the data in any budget and/or timeline projection, as it is almost sure to happen.  One company I worked for long ago never listened to me and we were almost always going over budget on a project.  In that same vein, we had to use data from HR departments that was always "dirty"; we would spend nearly 90% of the budget on just getting the data cleaned up, leaving precious little for the analysis phase.  They never understood that, either.

    We have to keep in mind when working with investigators that we as mathematicians are quite comfortable with abstract concepts.  To us, an equation is real; to the vast majority of people, it is just chalk on the board. Most people are "concrete thinkers".  They need something they can see, touch, etc. to make it real to them, like a graph or a table or a tangible example like "in a room of 30, we expect 10 to have this gene." I think this is a greatly unappreciated gap between the statistician and the investigator.