1) Here is a restatement of the issue.
I have a single group of patients, N number of them. They are drawn from a large population. (I don't recall saying that N was finite population. I meant to say that it was a known value.) I give all the patients the same treatment. Some time later I then measure two things about each one of them, X and Y. Let us say in this restatement that both are binary variables. As an example X is a measure of efficacy (patient feels better, yes/no) and Y is a safety measure (patient has an adverse reaction, yes/no).
The question a client will ask is, are patients who get the efficacy response more likely to get the safety response? Is there a statistical test? I suspect people reading this might suggest, do a Chi-Square test, a 2x2 table.
Now suppose that instead of Y being binary, it's a continuous measure such as a lab value, let's say higher lab values are bad. The question from the client becomes, are patients who get an efficacy response more likely to have higher lab values? Clients are probably going to think in terms of mean values, so they will ask do the means differ? That is not unreasonable to ask -- as I stated earlier, I am treating this group of patients and the treatment may well have changed some patients (e.g., the ones with an efficacy response) but not others. Clients will want to know about whether there is statistical significance here between the two "groups" the efficacy responder and non responders.
Thanks.
2) As I am sure many of you can tell, I am new to this board. I don't see a FAQ page or a list of moderators. If anyone could reply to me with a link to that info, I'd appreciate it.
Thanks again.
-------------------------------------------
David Gruben
-------------------------------------------
Original Message:
Sent: 05-01-2012 18:39
From: Michael Chernick
Subject: Test of mean differences when group membership is random
Scott: I think you may be making some incorrect assumptions. I still have no idea what David's problem is. First he poses it in a way that is ridiculous and so when I point that out he tells us that X is related to the normal random variable Y and based on whether or not X>d Y belongs to group 1 or group 2. He states that N is the size of a finite population and k is a random variable that represents the number of Ys for which X>d. I don't know if he samples n out of N at random or if he determine k based on the entire finite population. Maybe N is the sample size and the population is a large (theoretically infinite population). If there is random sampling going on so that inference makes sense is it appropriate to test conditional on the observed k or should we construct an unconditional test? Now if k is positively or negatively correlated with Y then the correlation has an influence on whether or not the population means differ and if they do the correlation affects which mean is larger.
I am still waiting for David to explain the problem before I try to solve it.
-------------------------------------------
Michael Chernick
Director of Biostatistical Services
Lankenau Institute for Medical Research
-------------------------------------------
Original Message:
Sent: 05-01-2012 17:19
From: Scott Berry
Subject: Test of mean differences when group membership is random
David,
--> If you classify the groups 0 and 1 by Y>d and Y≤d, respectively, then test whether the mean of Y for groups 0 and 1 are different, then of course they are different -- it is a tautology. Carrying out the test (no reason you cant do a t-test) is even a bad idea. You didnt mean this.
--> I think your classification in to the groups, 0 and 1, is based on X > d and X≤d -- where X is related to Y in some way (and sounds like it may be an early measure of the same thing Y is or a possible biomarker?). This is straightforward to do statistically and doing a t-test would be fine. The problem is the interpretation, not the analysis. You might want to do this to test if X is a biomarker of Y -- and the strength of the correlation.
You could classify progression or overall survival (Y) based on early "response" (X). You could classify ADAS-cog for alzheimer's at 1 year based on a 1 month classification of a biomarker change in the subject. Getting the interpretation correct is very important though!!! Does X cause Y? Does Y cause X?, can we just use X to select the dose in a phase II trial because X and Y are related?... etc... is the very important thing here, and likely a big part of your value in the consulting -- what your client does with the test is the critical thing -- the analysis is likely quite simple.
--Scott
Typical scatterplots of X vs Y, regressions, and ROC's varying 'd' may also be very instructive to the client.
-------------------------------------------
Scott Berry
Statistical Scientist
Berry Consultants
-------------------------------------------