Discussion: View Thread

Question regarding basic 2 group comparison.

  • 1.  Question regarding basic 2 group comparison.

    Posted 01-19-2012 13:09

    I was recently referred to analyze a project whose intention it was to evaluate the use of an initial (i.e. onboarding) customer support program in the effort to reduce the number of problem calls later on.

    The way it appears the study was designed : a sample of incoming customers over a specific time period were randomly assigned to either the 'onboarding' program or the status quo (I cannot tell by what method the assignment was done, so I can't speak to its validity).  The sampling approach they used was odd to me, and I can't assert at this point that they were homogenous in any way. They used a sample size calculator on http://www.raosoft.com/samplesize.html.  The way they propose a sample on this page seems like they could dangerously arm a novice with erroneous information.

    In the end, I am not sure what inputs they used, but assured me that the sample was given "at the 95% confidence interval". The rules for sampling were clearly not laid out. For example, they defined a period in which they would "recruit" (over a months time) however they didn't define the length of time that, post the onboarding period that event (problem ticket) data would be collected. It appears they defined the length of data capture within the "recruitment" period. So basically those who had been assigned to the program later in the game would likely have fewer events captured (oy vey - does it get worse?).

    I think some of this can be salvaged, but I'm thinking tactically how. Since the ticket data is captured by customer, there shouldn't be a problem with defining a window of observation post hoc for all customers. Here's the catch, for reasons apparently due to manpower to support this pilot project, the 2 groups ended up unequal - 308 in the experimental group and 1743 in the control. I don't have info for the mean and variance of each yet.

    So here's the question, if one were to compare the means of the two groups by t-test post hoc given the sample provided (of course with unequal n's and well...whatever I find out the sd's are for each), is there any way one can salvage the problem of finding out if the *magnitude* of the difference between the 2 groups is significant, particularly given that there was no a priori powering to determine the within group sample size (and really unknown effect size). If we input the observed difference as the effect size and determine the achieved power post hoc (as if we're just doing a 'data discovery' project), is this appropriate?

    Canning the original study for reasons that it wasn't conducted correctly wouldn't bode well for departmental headcount (alot of manpower appears to have put into this experiment)...so the pressure is on here to try and help the best way I can. Any thoughts are greatly appreciated.


     Many thanks!
    -------------------------------------------
    Phillip Middleton
    -------------------------------------------


  • 2.  RE:Question regarding basic 2 group comparison.

    Posted 01-19-2012 13:28
    oy vey, indeed!  Do they have to have a P value?  Or can you successfully persuade them otherwise?
    -------------------------------------------
    Eric Siegel
    Biostatistician
    Univ of Arkansas for Medical Sciences
    -------------------------------------------




  • 3.  RE:Question regarding basic 2 group comparison.

    Posted 01-19-2012 13:37
    I think I have an idea that could work and is probably a little out of the box.  Forget about the randomization.  Treat it like an observational study.  Find covariates to use for balancing the groups and then do propensity score matching to try to match as many from the treatment group with controls (probably in a 1:1 fashion).  For that kind of analysis it is actually an advantage that you have over 5 times the number of control subjects to treated subjects.  It improves your chances of matching.  Of course you don't get something for nothing.  At best you have 308 treatment matched to 308 controls and possibly less if the propensity score matching doesn't work perfectly.  But at least then you can do a statistical comparison with some sense of validity since confounding factor issues are handled to a degree.

    -------------------------------------------
    Michael Chernick
    Director of Biostatistical Services
    Lankenau Institute for Medical Research
    -------------------------------------------








  • 4.  RE:Question regarding basic 2 group comparison.

    Posted 01-19-2012 13:44

    It seems important to know how the groups ended up with unequal sizes.  It's one thing if they simply decided to take a random 20% into the experimental group instead of 50%. Nothing wrong with that.  It's altogether a different thing if they simply stopped putting callers into the experimental group when they didn't have a way to service them (on busy days, at the end of the week, after their software failed, etc.)   That would invalidate any comparison, it seems to me.  




    -------------------------------------------
    Bridget Bly
    -------------------------------------------








  • 5.  RE:Question regarding basic 2 group comparison.

    Posted 01-19-2012 15:40
    ...but it wouldn't invalidate the comparison if one treated it as an observational study, as Michael suggested.  Otherwise, pretty much all of epidemiology would be invalid.

    -------------------------------------------
    Eric Siegel
    Biostatistician
    Univ of Arkansas for Medical Sciences
    -------------------------------------------





  • 6.  RE:Question regarding basic 2 group comparison.

    Posted 01-19-2012 13:35

    Hi, I may have misunderstood the design, but with regard to the two groups not having equal sample sizes - that is OK, there are many reandomized trials in which the ratio of n1:n2 is other than 1.0 by design, i.e., approximately 1:2, 1:3, etc. Not having an equal number in each group will, of course, affect the statistical power for comparing the groups and should be taken into consideration in the sample size estimate.  If I understand the design correctly, subjects were recruited over time and, therefore, were followed for various lengths of time.  This is generally the case in randomized trials, and the method of analysis should then be "survival analysis" (such as Kaplan-Meier estimates, Coc proportional hazards model, etc.), by which subjects who did not experience an "event" during their follow-up are censored at the end of their follow-up time. 
    -------------------------------------------
    Edith Zang
    Independent Consultant
    NYCASA
    -------------------------------------------








  • 7.  RE:Question regarding basic 2 group comparison.

    Posted 01-19-2012 13:49

    I agree with Edith that you can view this as two point processes and a comparison of say two Kaplan - Meier curves is probably a more appropriate analysis and in this case gets around the different sampling times for individuals.  Survival analysis does handle this through censoring and of course this is how clinical trials work as the trials are stopped at a fixed time point but patients are enrolled over an interval of time.

    But her solution doesn't address the issue of two independent samples.  My suggestion is to do the case-control matching and then do the analysis with say a log rank test comparing the two survival curves.  Wouldn't that overcome all the issues you raise?
    -------------------------------------------
    Michael Chernick
    Director of Biostatistical Services
    Lankenau Institute for Medical Research
    -------------------------------------------








  • 8.  RE:Question regarding basic 2 group comparison.

    Posted 01-19-2012 13:46
    Regarding post-hoc power analysis:  if you can obtain a confidence interval on the parameter in question, this implicitly indicates the degree of knowledge about the parameter, regardless of the intentions made prior to data collection.  Sample size estimates are just that--estimates--and they're made prospectively.  Once you've obtained data, intuition and the Likelihood Principle suggest that estimates made prior to seeing the data are moot.  Now you're interested in what the data says.  I think post-hoc power analysis is useful in interpreting a p-value, but a confidence interval conveys this implicitly.

    Your bigger problem is whether the sampling procedure is biased.  No amount of data, equally or unequally distributed to treatment and control, will help.  Off the top of my head, the way to salvage a biased sampling procedure is to gather additional data describing the biasing mechanism, and build a model including such covariates, and base interpretations on that model.  For instance, if the mechanism was such that three people were making assignments to treatment or control, and one of them was following protocol rigorously while the other two were just assigning everyone to control, then "Operator" (representing these three people) should  be included as a factor in your analysis.  This is if you're fortunate enough to know the biasing mechanism, and to have data that's informative about it.

    If the intention was to allocate half of participants to treatment, yet far more individuals were allocated to control than to treatment, apparently individuals in the study have differing probabilities of being allocated to treatment.  Can you estimate the probability of allocation to treatment as a function of other variables?  If you can build a model that performs better at predicting who's allocated to control (let's say with cross-validation) than noise levels expected with a 308/1743 split, then you're on to something, and the factors that enable you to make those predictions ought to be included in the analysis.

    It's possible, and would be fortunate, that the factors important for determining allocation actually have nothing to do with the effectiveness of the treatment, but you'd need to verify this.

    -Jim

    -------------------------------------------
    James Garrett
    Manager, R&D Statistics
    Becton Dickinson
    -------------------------------------------








  • 9.  RE:Question regarding basic 2 group comparison.

    Posted 01-19-2012 13:55
    Aren't we overthinking and complicating things too much?  I agree that finding covariates that could be confounders is the key.  But you don't have to make regression adjustments or worry about unequal sample size.  Just do the case-control analysis based on matching using the covariates.  Propensity score matching is an effective way to do this.  Does anyone see anything terribly wrong with my suggestion? 

    -------------------------------------------
    Michael Chernick
    Director of Biostatistical Services
    Lankenau Institute for Medical Research
    -------------------------------------------








  • 10.  RE:Question regarding basic 2 group comparison.

    Posted 01-19-2012 14:01

    I am not clear as to why the 2 groups need to be matched since they were randomly assigned to one of the groups, which (at least if the total sample size is reasonable) should approxiately equalize both known and unknown potential confounders between the groups.
    -------------------------------------------
    Edith Zang
    Independent Consultant
    NYCASA
    -------------------------------------------








  • 11.  RE:Question regarding basic 2 group comparison.

    Posted 01-19-2012 14:06

    The problem with propensity scoring and matching is that how well it works is entirely dependent on what variables you have available for the scoring. The researcher would have to first identify why and how the unequal assignments were made and be assured that relevant variables could be captured and measured.  What if, for instance, assignment was made to control whenever the wait time on the phone got beyond 5 minutes?  If that data was not captured for each call, then you wouldn't have the relevant propensity scores.  And wait time would certainly be relevant to any customer service outcomes like non-resolution of tickets.    

    -------------------------------------------
    Bridget Bly
    -------------------------------------------








  • 12.  RE:Question regarding basic 2 group comparison.

    Posted 01-19-2012 14:17

    Yes of course removing bias by match requires identifying covariates that balance the sample.  No matter what solution you try you have to find a way to get balance.  I will grant you that if sampling were done in the way you suggest with your example it would be important to know and possibly leave no recourse.  Regardless of method it would definitely be a good idea to try to clarify the sampling mechanism because as you say it can have serious repercussions.  So I am assuming that balancing covariates can be found.  My experience with medical studies is that the investigators know a lot about patient characteristics that could affect response.

    But I take it that this is an effort to salvage a flawed study and the initial post and first response indicated a feeling of hopelessness.  I don't think it is hopeless and I still think this approach is viable.
    -------------------------------------------
    Michael Chernick
    Director of Biostatistical Services
    Lankenau Institute for Medical Research
    -------------------------------------------








  • 13.  RE:Question regarding basic 2 group comparison.

    Posted 01-19-2012 14:35
    Even if the sampling method was flawed, if randomization was carried out correctly  the two groups should still be comparable in terms of the outcome.  This is true even if they don't constitute a representative sample of the target population and, therefore, the results cannot be generalized to the target population.
    -------------------------------------------
    Edith Zang
    Independent Consultant
    NYCASA
    -------------------------------------------








  • 14.  RE:Question regarding basic 2 group comparison.

    Posted 01-19-2012 14:42
    I think when we say "the sampling mechanism is flawed" we are just choosing diffferent words to say that randomization is not carried out properly.

    -------------------------------------------
    Michael Chernick
    Director of Biostatistical Services
    Lankenau Institute for Medical Research
    -------------------------------------------








  • 15.  RE:Question regarding basic 2 group comparison.

    Posted 01-19-2012 17:46
    Heh, well after gathering more information.... there was no 'real' scheme of randomization. It turns out that this is little more than a pilot program, and a few folks somehow inserted the jargon.

    Essentially a new customer would sign up for a product with the company online. All of those requests are placed in a queue. During any time of day, a team of support people process those requests, only one of which who was deemed to have the skill level and trained to give the 'onboarding' type of support. That person would 'onboard' the customer (basically giving a 5-10 min in-service to the customer on getting started and critical information and resources that would be emphasized in order to preemptively address areas that would otherwise turn into a ticket). They did this until they reached the number that the analyst calculated/deemed sufficient as a sample (obviously, not the right method). All other customers who signed up were considered the control group.

    As far as case matching is concerned...welp....I have a single possible covariate that I may be able to use to differentiate customer (monthly customer activity).  I"m not sure if it will be 'usable' but stay tuned....

    The nice thing is that the observation window can be held constant (30 days) as this data is collected anyway - so making sure that all customers are tracked for the same duration is not really an issue (although since the data were collected over December, there could be some differences in customer activity over the holidays - perhaps a potential source of bias, but no data to clarify that).

    It is true that the event (trouble tickets) differs by customer and so can be recurring (not necessarily for the same problem of course). Some customers will initiate more, some less, some perhaps not at all - hence why I was thinking of looking at the overall difference in the means of the tickets generated between those who were onboarded and those who weren't.

    I like the idea of a K-M analysis, and similar other time-to-event approaches suggested and think this could be a more intuitive analysis as it also helps demonstrate the magnitude of the difference between the groups as probabilities. And truly, what the business is interested in is what the costs and benefits are of initiating the onboarding program. Not only understanding *that* there is a difference, but (and over time) roughly *how much* of a difference there is between groups should aid in that evaluation.


    This is such a helpful group! You guys are great (fanatical to be sure, given response times on my posts!)!


    -------------------------------------------
    Phillip Middleton
    -------------------------------------------








  • 16.  RE:Question regarding basic 2 group comparison.

    Posted 01-19-2012 19:32

    I am still not clear how the incoming subjects were assigned to group?  Randomly?
    -------------------------------------------
    Edith Zang
    Independent Consultant
    NYCASA
    -------------------------------------------








  • 17.  RE:Question regarding basic 2 group comparison.

    Posted 01-20-2012 10:48

    So, do I understand correctly that there was only one person doing the "experimental" treatment, while the control had multiple people?  If so, then there will be problems with confounding.  For example, the person who was doing the onboarding would likely be working on a set schedule, and only customers who would call during those times could be in the experimental group.  Any other factor by which this one person differs from the remainder of the "operators" leads to confounding. 

    If I understand correctly, then you can do an analysis but you will never know if the difference is due to the experiment or just operator.  Is it possible that the one person who was doing the onboarding did both onboarding and standard treatment of callers?  If so, then the best you can do is to examine that operators data.

    I also have a concern about using survival analysis & the analysis of recurrent events.  Is it possible that the same customer would go to 2 different operators?  This would create a big challenge for the analysis.

    -------------------------------------------
    Robert Podolsky
    Georgia Health Sciences University
    -------------------------------------------








  • 18.  RE:Question regarding basic 2 group comparison.

    Posted 01-19-2012 14:19

    I agree with Bridget, and that is why randomization is the best approach - because it is supposed to equalize all known and unknown potential confounders.
    -------------------------------------------
    Edith Zang
    Independent Consultant
    NYCASA
    -------------------------------------------








  • 19.  RE:Question regarding basic 2 group comparison.

    Posted 01-19-2012 14:23
    The argument given to me about why the experimental sample was small was simply due to manpower, as someone suggested earlier. As far as randomization is concerned, I absolutely must dig into how they executed this further (meeting with them today), as it could naturally throw in a bias. Additionally, I don't know what covariates were captured (i.e. something that would tell me that the within group/customer heterogeneity would be a confound). This could affect matching, but I see the reason behind doing this (may be the 'best' we've got).

    I also agree that this should be treated as a retrospective observational study and scratch the original 'prospective' sample sizes. My thought is simply to treat the groups independently and throw out the randomization (in which case matching will probably be a necessity - I had a suspicion about this).

    Regarding survival analysis.... isn't that then answering the question of  'time-to-problem ticket' event instead of the question regarding whether or not the observed difference between control/experiment is significant? Just FYI, I'm also a student diving into a grad program and have only cursory knowledge of K-M survival curves for risk quantification.

    -------------------------------------------
    Phillip Middleton
    -------------------------------------------








  • 20.  RE:Question regarding basic 2 group comparison.

    Posted 01-19-2012 14:52
    The fact that manpower was limiting for the experimental sample and not the control sample suggests that the investigators treated the customers in these two groups differently.  Is the only difference the effort needed as part of the new program?  If not, then there is confounding.  Adjusting with appropriate covariates one way or another might solve the problem, but there is not guarantee since the result will depend on the extent to which the covariates relate to the actual confounding. 

    I do like the idea of using Kaplan-Meier since the data are events and timing to events.  The relevant information for differences between groups would be contained within the time to ticket since shorter times would imply a larger number of tickets.  The potential challenge that I see is that one customer may have more than ticket, meaning that the data represent recurrent events.  Perhaps the data should be analyzed as such?

    -------------------------------------------
    Robert Podolsky
    Georgia Health Sciences University
    -------------------------------------------








  • 21.  RE:Question regarding basic 2 group comparison.

    Posted 01-19-2012 15:02

    Robert makes a good point about recurrent event.  For those not familiar with this type of analysis Jerald Lawless has a book out and so does Wayne Nelson.  Wayne's book is in the ASA/SIAM series.  You can find them through amazon.com.  I believe they are both still in print.  I will check for myself.
    -------------------------------------------
    Michael Chernick
    Director of Biostatistical Services
    Lankenau Institute for Medical Research
    -------------------------------------------








  • 22.  RE:Question regarding basic 2 group comparison.

    Posted 01-19-2012 15:05
    The book I have for recurrent event analysis is Cook and Lawless and is current.

    -------------------------------------------
    Robert Podolsky
    Georgia Health Sciences University
    -------------------------------------------








  • 23.  RE:Question regarding basic 2 group comparison.

    Posted 01-19-2012 15:10

    I found both books on amazon and noticed that Iwas the only one to do a customer review.  I reviewed both books.
    -------------------------------------------
    Michael Chernick
    Director of Biostatistical Services
    Lankenau Institute for Medical Research
    -------------------------------------------








  • 24.  RE:Question regarding basic 2 group comparison.

    Posted 01-19-2012 16:14
    Although recurrent-events analysis would be cool and ought to be done eventually, I will argue that the initial analysis ought to be a simple survival analysis using Time to First Event as the outcome variable.  After that, one can get fancy in the manner of one's choosing.

    -------------------------------------------
    Eric Siegel
    Biostatistician
    Univ of Arkansas for Medical Sciences
    -------------------------------------------





  • 25.  RE:Question regarding basic 2 group comparison.

    Posted 01-19-2012 18:31
    I don't think this is good advice especially since you make it unequivocally.  I would look at the data first.  Even though recurrent events are possible with this data if none occur the approach doesn't matter since the nonparametric estimate of time to first even will coincide with the KM estimate.  Perhaps throwing out a few recurrent events does little harm if there are only a few.  But if some patients have several events this simplification could grossly misrepresent the data.  Subjects with many recurrences may have different characteristics than those with just one event or none.  If you throw out all the recurrent events you would be throwing away a lot of important information.  This would be akin to doing survival analysis on the complete data only and throwing out the censored data.  None of us would recommend that.  But naive practitioners who do not know how to handle censored observations have been known to do it.

    If you do survival analysis without throwing out the recurrent events how do you treat them? Would you pretend it is a first event starting from the previous event for that subject?  That would mean treating dependent events as though they were independent.

    Recurrent event analysis is not fancy analysis.  It is not right to consider recently developed theory and methodology to be fancy.  The theory has developed because the applications are becoming important and the additional information needs to be used properly.  I can understand be hesitant to use a new method if the theory is not well-established.  But often the theory and applications are both well-established and people shy away from it just because it is new and unfamilar to them.

    -------------------------------------------
    Michael Chernick
    Director of Biostatistical Services
    Lankenau Institute for Medical Research
    -------------------------------------------








  • 26.  RE:Question regarding basic 2 group comparison.

    Posted 01-19-2012 22:24
    I didn't say don't do it.  I said do the simple stuff first.

    -------------------------------------------
    Eric Siegel
    Biostatistician
    Univ of Arkansas for Medical Sciences
    -------------------------------------------





  • 27.  RE:Question regarding basic 2 group comparison.

    Posted 01-19-2012 22:35

    I do understand that but what value is there to doing survival analysis in a situation where it is inappropriate? If the data suggests that the survival analysis might be a good approximation to reality then what you suggest is fine. My point is really that you were saying no matter what the data suggest do the survival analysis.  That is all that I object to. 
    -------------------------------------------
    Michael Chernick
    Director of Biostatistical Services
    Lankenau Institute for Medical Research
    -------------------------------------------








  • 28.  RE:Question regarding basic 2 group comparison.

    Posted 01-19-2012 22:43
    Are you saying that, in a context where recurrent-events analysis is being discussed, that it makes no sense begin by doing Kaplan-Meier curves on the Time To First Event? 

    -------------------------------------------
    Eric Siegel
    Biostatistician
    Univ of Arkansas for Medical Sciences
    -------------------------------------------








  • 29.  RE:Question regarding basic 2 group comparison.

    Posted 01-19-2012 23:14

    Yes when there are many recurrent events it clearly is the wrong analysis for the reasons I already mentioned.
    -------------------------------------------
    Michael Chernick
    Director of Biostatistical Services
    Lankenau Institute for Medical Research
    -------------------------------------------








  • 30.  RE:Question regarding basic 2 group comparison.

    Posted 01-20-2012 10:24


    "So basically those who had been assigned to the program later in the game would likely have fewer events captured"
     
    Maybe I've missed this being proposed already, but wouldn't you want to treat this as a comparison of rates? You have a count of events over a given follow-up time. The follow-up time is different per person, but if I'm remembering my generalized linear modelling correctly, this can be accounted for in the model. The predictor of interest would be treatment group. You could certainly add other covariates to the model for the sake of balancing potential confounders.

    -------------------------------------------
    Laila Poisson
    Biostatistician
    Henry Ford Health System
    -------------------------------------------








  • 31.  RE:Question regarding basic 2 group comparison.

    Posted 01-20-2012 10:42

    Yes, we are concerned with rates, but here the relevant rate is the time to event.  The challenge is analyzing time to event when one "subject" can have multiple events, as is possible here.  As a result of the multiple events, we consider the data s be recurring events and handle the analyses in that way.
    -------------------------------------------
    Robert Podolsky
    Georgia Health Sciences University
    -------------------------------------------








  • 32.  RE:Question regarding basic 2 group comparison.

    Posted 01-19-2012 19:46
    No one is.  Phillip said he was going to ask about it at a meeting with the client today.  I gather that the client thinks that they were randomizing but they may have done something that ruined it but they don't realize it.  So he needs nitty gritty details to determine whether or not it is a legitimate randomization. If it was randomized they must have used a high ratio of control subjects to treatment subjects and he doesn't understand why.  I think he suspects that they planned 1:1 randomization and modified the enrollment to get more controls because it was easier and maybe less costly to work with controls. 

    -------------------------------------------
    Michael Chernick
    Director of Biostatistical Services
    Lankenau Institute for Medical Research
    -------------------------------------------








  • 33.  RE:Question regarding basic 2 group comparison.

    Posted 01-19-2012 14:52
    If subjects were enrolled over time and, therefore, were followed for different lengths of time, the only way to compare the outcome is by using survival analysis, because this method allows for "censoring" those who did not experience an event.  Simple t-tests will not account for the fact that some of the subjects did not experience an "event" simply becaused they were not followed as long as some who were followed longer. The "time-to" factor also adjusts for those who were among the first to be enrolled and, therefore, had much more time to experience the event than others.
    -------------------------------------------
    Edith Zang
    Independent Consultant
    NYCASA
    -------------------------------------------








  • 34.  RE:Question regarding basic 2 group comparison.

    Posted 01-19-2012 14:25

    It is inconceivable that the randomization procedure was to get a 1-1 allocation.  Either it was stacked intentionally or unintentionally.  If unintentionally, then the whole study is suspect, and should be considered as strictly an observational study, not as a randomized one.  Confounders will not be reliable predictors of bias, so adjustments won't be entirely satisfactory.  But maybe the randomization was 6:1, in which case a Sattherttwaite corrected t-test would be assumption-free.

    Jon
    -------------------------------------------
    Jon Shuster
    University of Florida
    -------------------------------------------








  • 35.  RE:Question regarding basic 2 group comparison.

    Posted 01-19-2012 15:56
    It might be valuable to divide the recruiting period into four or five time segments, and ask if the randomization ratio (of "onboarding" to "status quo") differs significantly across segments.   

    -------------------------------------------
    Eric Siegel
    Biostatistician
    Univ of Arkansas for Medical Sciences
    -------------------------------------------