Discussion: View Thread

Test of mean differences when group membership is random

  • 1.  Test of mean differences when group membership is random

    Posted 04-30-2012 19:37

    I have normally distributed data, Ys.  They are from a single group of N patients.  I treated the patients, and at Month xx, I looked at each individual and measure Y;  and if Y>d (d is known), I put it in Group 1 and if Y<=d it goes in Group 0.  There a k number of observations in Group 1; k therefore is the realization of a random variable.

    Naturally my client wants me to test whether the mean of the k Ys in Group 1 differ from the mean of the N-k Ys in Group 0.  What would you suggest as a proper test?

    -------------------------------------------
    David Gruben

    -------------------------------------------



  • 2.  RE:Test of mean differences when group membership is random

    Posted 04-30-2012 19:54

    I don't understand the question.  What are the two populations?  If I take you literally there are no paopulations and you are asking whether the two samples have the same mean.  That you would know by just computing the means and you would not need to do a statistical test.  now if you are asking what is the probability that the mean of group 1 is larger than the mean of group 0 when you sample this way the answer would be 1 since all the ys in group 1 are > d and all the ys in group 0 are less than or equal to d.  Was this a serious question?
    -------------------------------------------
    Michael Chernick
    Director of Biostatistical Services
    Lankenau Institute for Medical Research
    -------------------------------------------








  • 3.  RE:Test of mean differences when group membership is random

    Posted 04-30-2012 20:08
    Michael,

    Yes it is a serious question.  Let me try again. 

    There was no randomization at the start of the study, so it is a single group of treated individuals, N of them.  The response, Y, is a change from baseline.

    d is a well known cut point.  If Y>d then the patient is a 'responder'.  The client is now implicitly thinking that the treatment given this group of patients has revealed an intrinsic (latent?) difference among the patients, that Group 1 has a different underlying mean than Group 0 -- and as such, the client would like investigate the mean difference.

    If it makes the problem more palatable, consider that there are actually two things being measured on each of these patients after treatment: X and Y.  If X>d put the Y value into Group 1, X<=d, put Y into Group 0.


    -------------------------------------------
    David Gruben

    -------------------------------------------








  • 4.  RE:Test of mean differences when group membership is random

    Posted 04-30-2012 20:23
    It is not a matter of making the problem palatable, it is a matter of having it make sense.  When you mention the Xs and Ys then it starts making sense.  So now it is clear that you are dealing with a single finite population of size N. Are you taking a sample from the finite population or just splitting the poplation into the k responders and the N-k non-responders.  In the first case there is an inference problem based on a sample from a finite population.  In the latter case there is again nothing to test.  You simply average the k Ys in group 1 and compare it to the average in group 0.

    -------------------------------------------
    Michael Chernick
    Director of Biostatistical Services
    Lankenau Institute for Medical Research
    -------------------------------------------








  • 5.  RE:Test of mean differences when group membership is random

    Posted 05-01-2012 12:55
    I am trying to start out as a consultant.  I do not have much experience consulting, but I do have a good education.  I find reading and commenting on list_serves helps me learn about newer techniques and how to explain statistical concepts. 


    David, your problem seems to be a problem related to order statistics, since your are separating your data by size.  Since you know that the data is distributed normally, You might compare the number you expect to find greater than d, using the mean and variance for the full data set, to what you do get.  I  am not sure if there is a statistic out there to do the test.


    Margot

    -------------------------------------------
    Margot Tollefson
    Owner
    Vanward Statistical Consulting
    -------------------------------------------








  • 6.  RE:Test of mean differences when group membership is random

    Posted 05-01-2012 14:28

    Margot:  I think it is difficult to go straight into consulting without much prior work experience.  We all learn on the job and i am sure you can too.  When you have industrial experience or a recognized name it is a lot easier to get clients and the experience really helps once you have them.

    Regarding David's question, I still don't know if he is taking a random sample from a finite population of N or is just looking at all the N cases and discovering that k have X>d and N-k have X<=d. If it is the latter there is no inference problem.  If the former then what is the sample size n take from the population of size N.  In the former case k is a random variable and you would need to keep that in mind formulating the test.  This problem has not been explained well.  One thing that I think is important for a consultant to do is to ask questions and keep asking questions until you are sure that you understand the problem thoroughly.  Only then are you in a position to find a solution.


    -------------------------------------------
    Michael Chernick
    Director of Biostatistical Services
    Lankenau Institute for Medical Research
    -------------------------------------------








  • 7.  RE:Test of mean differences when group membership is random

    Posted 05-01-2012 17:20
    David, 

    --> If you classify the groups 0 and 1 by Y>d and Y≤d, respectively, then test whether the mean of Y for groups 0 and 1 are different, then of course they are different -- it is a tautology.  Carrying out the test (no reason you cant do a t-test) is even a bad idea.  You didnt mean this.

    --> I think your classification in to the groups, 0 and 1, is based on X > d and X≤d -- where X is related to Y in some way (and sounds like it may be an early measure of the same thing Y is or a possible biomarker?).  This is straightforward to do statistically and doing a t-test would be fine.  The problem is the interpretation, not the analysis.  You might want to do this to test if X is a biomarker of Y -- and the strength of the correlation.  

    You could classify progression or overall survival (Y) based on early "response" (X).  You could classify ADAS-cog for alzheimer's at 1 year based on a 1 month classification of a biomarker change in the subject.  Getting the interpretation correct is very important though!!!  Does X cause Y?  Does Y cause X?, can we just use X to select the dose in a phase II trial because X and Y are related?... etc... is the very important thing here, and likely a big part of your value in the consulting -- what your client does with the test is the critical thing -- the analysis is likely quite simple.

    --Scott

    Typical scatterplots of X vs Y, regressions, and ROC's varying 'd' may also be very instructive to the client.

    -------------------------------------------
    Scott Berry
    Statistical Scientist
    Berry Consultants
    -------------------------------------------








  • 8.  RE:Test of mean differences when group membership is random

    Posted 05-01-2012 18:39
    Scott:  I think you may be making some incorrect assumptions.  I still have no idea what David's problem is.  First he poses it in a way that is ridiculous and so when I point that out he tells us that X is related to the normal random variable Y and based on whether or not X>d Y belongs to group 1 or group 2.  He states that N is the size of a finite population and k is a random variable that represents the number of Ys for which X>d.  I don't know if he samples n out of N at random or if he determine k based on the entire finite population.  Maybe N is the sample size and the population is a large (theoretically infinite population).  If there is random sampling going on so that inference makes sense is it appropriate to test conditional on the observed k or should we construct an unconditional test?  Now if k is positively or negatively correlated with Y then the correlation has an influence on whether or not the population means differ and if they do the correlation affects which mean is larger.

    I am still waiting for David to explain the problem before I try to solve it.

    -------------------------------------------
    Michael Chernick
    Director of Biostatistical Services
    Lankenau Institute for Medical Research
    -------------------------------------------








  • 9.  RE:Test of mean differences when group membership is random

    Posted 05-01-2012 19:19
    1) Here is a restatement of the issue. 

    I have a single group of patients, N number of them.  They are drawn from a large population.  (I don't recall saying that N was finite population.  I meant to say that it was a known value.)  I give all the patients the same treatment.  Some time later I then measure two things about each one of them, X and Y.  Let us say in this restatement that both are binary variables.  As an example X is a measure of efficacy (patient feels better, yes/no) and Y is a safety measure (patient has an adverse reaction, yes/no).

    The question a client will ask is, are patients who get the efficacy response more likely to get the safety response?  Is there a statistical test?  I suspect people reading this might suggest, do a Chi-Square test, a 2x2 table. 

    Now suppose that instead of Y being binary, it's a continuous measure such as a lab value, let's say higher lab values are bad.  The question from the client becomes, are patients who get an efficacy response more likely to have higher lab values?  Clients are probably going to think in terms of mean values, so they will ask do the means differ?  That is not unreasonable to ask -- as I stated earlier, I am treating this group of patients and the treatment may well have changed some patients (e.g., the ones with an efficacy response) but not others.  Clients will want to know about whether there is statistical significance here between the two "groups" the efficacy responder and non responders.

    Thanks.



    2) As I am sure many of you can tell, I am new to this board.  I don't see a FAQ page or a list of moderators.  If anyone could reply to me with a link to that info, I'd appreciate it.

    Thanks again.

    -------------------------------------------
    David Gruben

    -------------------------------------------








  • 10.  RE:Test of mean differences when group membership is random

    Posted 05-01-2012 19:46

    In response to your second question...

    FAQs: http://community.amstat.org/AMSTAT/FAQs/

    Code of Conduct:  http://community.amstat.org/AMSTAT/CodeofConduct/

    If you post a reply these should be at the bottom of that "Post Reply" webpage.  And thanks for the reminder.  I'll re-read the Code of Conduct as it's sometimes tempting/convenient for me to forget about it.

    -------------------------------------------
    James Baldwin
    Station Statistician
    US Forest Service
    -------------------------------------------








  • 11.  RE:Test of mean differences when group membership is random

    Posted 05-01-2012 21:28

    I don't understand why you started saying Y was binary.  Now I am getting close to understanding your problem.  In the population you have responders and non responders and there is a lab value where high values indicate greater safety risk.  So the question is does the responder population have a higher average for the lab measure than the nonresponder group.  So to test this you take a random sample of size N and observe k responders and N-k nonresponders.  As you said before k is a random variable.  In this case it is simply because you have to measure X after you select the subject and so you can't predetermine how many responders you will have.  But you can condition on k and then apply a t test or Wilcoxon rank sum test whichever seems more appropriate. The only question would be, can we improve on this based on the value of k?  Probably no because k really only depends on the number of responders in the population relative to nonresponders.  This differs from your original problem where X was correlated with Y and large values of X dictated the group and hence k.
    -------------------------------------------
    Michael Chernick
    Director of Biostatistical Services
    Lankenau Institute for Medical Research
    -------------------------------------------








  • 12.  RE:Test of mean differences when group membership is random

    Posted 05-02-2012 09:28
    David, Some comments in bold (to differentiate, not exaggerate!)

    1) Here is a restatement of the issue. 

    I have a single group of patients, N number of them.  They are drawn from a large population.  (I don't recall saying that N was finite population.  I meant to say that it was a known value.)  I give all the patients the same treatment.  Some time later I then measure two things about each one of them, X and Y.  Let us say in this restatement that both are binary variables.  As an example X is a measure of efficacy (patient feels better, yes/no) and Y is a safety measure (patient has an adverse reaction, yes/no).

    The question a client will ask is, are patients who get the efficacy response more likely to get the safety response?  Is there a statistical test?  I suspect people reading this might suggest, do a Chi-Square test, a 2x2 table. 

    Now suppose that instead of Y being binary, it's a continuous measure such as a lab value, let's say higher lab values are bad.  The question from the client becomes, are patients who get an efficacy response more likely to have higher lab values?

    --> David -- This would be done using a pretty standard t-test.  Each subject is in a group -- 0 or 1 based on efficacy response.  Then a test of the means by the group indication.  You should do some investigating of plots to make sure things are appropriate -- make scatterplots of X and Y, histograms of X, etc.  But for an inferential test, do a t-test.

      Clients are probably going to think in terms of mean values, so they will ask do the means differ?  That is not unreasonable to ask -- as I stated earlier, I am treating this group of patients and the treatment may well have changed some patients (e.g., the ones with an efficacy response) but not others.  Clients will want to know about whether there is statistical significance here between the two "groups" the efficacy responder and non responders.

    --> Here is where you need to be careful.  You can make inferences here about Y in X-responders or not, and essentially you are investigating the relationship in these patients about X and Y, for treated subjects.  Extending this to some kind of inference about the "treatment" is very difficult.  Suppose you find 50% efficacy responders, and in those you have a -2 mean change in lab values (statistically significant), compared to non-efficacy responders.  What does this mean?  It means that efficacy responders have smaller lab values -- it still doesn't mean the 'treatment' did anything....  It may be that naturally, responders also have lower Y, unaffected by "treatment"... invariably you would have to ask, what would happen if they would have been treated with something else (or nothing)?

    --Scott

































  • 13.  RE:Test of mean differences when group membership is random

    Posted 05-02-2012 09:56
    I agree with Scott but just to clarify it is t-test or Wilcoxon depending on what diagnostic plots tell you.  Also this is a test conditional on the value of k (a technical distinction that probaly doesn't matter).

    -------------------------------------------
    Michael Chernick
    Director of Biostatistical Services
    Lankenau Institute for Medical Research
    -------------------------------------------








  • 14.  RE:Test of mean differences when group membership is random

    Posted 05-02-2012 09:40
    > As I am sure many of you can tell, I am new to this board.  I don't see a FAQ page or a list of moderators.  If anyone could reply to me with a link to that info, I'd appreciate it.

    Those who would like to participate in moderated discussions of statistical questions will be interested in the forums at http://stats.stackexchange.com/questions.  This two-year old community now comprises about 7,000 people internationally, of whom 100 are routinely active.  (These tend to be consulting, industrial, and academic statisticians in many fields.)  Although I greatly respect and appreciate the discussions carried out here in the ASA Stats Consulting Section eGroup, especially those about the practice of consulting, I find the StackExchange site more congenial and productive for discussing technical questions because it is well organized, easily searchable, and provides many more tools for interaction and research.  (Full disclosure: I am one of the site's moderators.)

    --Bill Huber
    Quantitative Decisions


  • 15.  RE:Test of mean differences when group membership is random

    Posted 05-02-2012 10:01

    Nobody moderates.  We all are free to start our own discussions and take them in any direction we choose.  There may be some ASA oversight to make sure appropriate language is used and the content is appropriate.
    -------------------------------------------
    Michael Chernick
    Director of Biostatistical Services
    Lankenau Institute for Medical Research
    -------------------------------------------








  • 16.  RE:Test of mean differences when group membership is random

    Posted 05-03-2012 10:24
    Bill Huber is completely right; StackExchange is a good place to ask questions and get them answered.

    My post here provides me with an opportunity to thank him (and, by extension, other active moderators). He's a very active moderator and is particularly good at trying to clarify the questions. And as any consultant knows, if you can get the question clarified, you are well along the way to the answer.

    I find that it's interesting just to browse the questions and see what types of things people are asking.  But maybe I'm just easily amused. ;)

    -------------------------------------------
    Michael Kruger
    Information Resources Inc
    -------------------------------------------