Discussion: View Thread

Paired Count Data

  • 1.  Paired Count Data

    Posted 10-25-2011 16:37

    I need to compare paired count data (Differences between pre and post measures) that follows Poisson distribution. Can anyone suggest an appropriate test for this?....can this be done in sas/stata/spss?

    -------------------------------------------
    Shahidul Islam, MPH
    Biostatistician
    Winthrop University Hospital
    Mineola, NY 11590
    -------------------------------------------


  • 2.  RE:Paired Count Data

    Posted 10-25-2011 16:52
    Your description is a little vague.  I might want to know why the Poisson model applies to the counts.  Are you assuming all subjects have the same rate for the "pre" measure but possible different for the "post" measure.  If that is the case you really are testing for differences between two Poisson distributions in the rate parameter.  This would be a standard test if it weren't for the pairing.  But pairing suggests that maybe you have subject to subject differences which might suggest a subject related component to the rate.  You really need to be more specific about the model.

    -------------------------------------------
    Michael Chernick
    Director of Biostatistical Services
    Lankenau Institute for Medical Research
    -------------------------------------------








  • 3.  RE:Paired Count Data

    Posted 10-25-2011 17:25
    You could do this as a repeated measures generalized linear model (with log link and Poisson distribution) using proc genmod in SAS.

    -------------------------------------------
    Colleen Kelly
    Principal Consultant
    Kelly Statistical Consulting
    -------------------------------------------







  • 4.  RE:Paired Count Data

    Posted 10-25-2011 17:37

    I like the suggestion if you are guessing correctly about the problem.  I don't think we understand the problem yet.  He mentioned paired data ahnd a Poisson model for the counts. Why Poisson?  What does pre and post mean?  Are we looking at patients before and after an intervention? Do we expect the rate to increase or decrease after the intervention if there is an intervention?  If these are patients how does the patient's characteristics affect the rate?  Is Poisson really justified or is it being used because it is simple?  Good consulting demands getting answers to such questions before making recommendations.  Well that is me getting on my soap box again!
    -------------------------------------------
    Michael Chernick
    Director of Biostatistical Services
    Lankenau Institute for Medical Research
    -------------------------------------------








  • 5.  RE:Paired Count Data

    Posted 10-25-2011 23:09

    Thanks to all for taking the time to response to my request.

    To answer Michael's valuable questions:

    Patients are pretty similar as they share some common characteristics.

    Outcome variable is "number of times blood glucose levels drop below a cutoff point" (a rare event for my patient population) during a standardized glucose monitoring period.

    We want to measure this prior to a surgery then post surgery at several times (at least 5 time points)

    For repeated measure data, I can use proc genmod for poisson distribution. However, I am interested in comparing the pre-measure with each of the post measures separately. I thought of using signed rank test but wondered if there is a different test to compare differences for rare count data.

    Thanks again for your help.


    -------------------------------------------
    Shahidul Islam, MPH
    Biostatistician
    Winthrop University Hospital
    Mineola, NY 11590
    -------------------------------------------








  • 6.  RE:Paired Count Data

    Posted 10-26-2011 08:07

    Now I hope everyone sees that the context of the problem is very important.  Given this detailed explanation I would say that Colleen Kelly"s recommendation is the best so far.  I think you could make post surgery count the response variable and use the baseline pre surgery count as a covariate rather than look at the difference in counts.  since crossing the lower boundary is suppose to be a rare event perhaps the Poisson model is a good one.  The use of the covariate allows you to see how closely the two counts relate to each other on a patient by patient basis.  I would test the regression coefficient for statistical significance.

    This type of analysis is far better in my opinion than a signed rank test.  There may be a better approach but I don't know what it would be off hand.
    -------------------------------------------
    Michael Chernick
    Director of Biostatistical Services
    Lankenau Institute for Medical Research
    -------------------------------------------








  • 7.  RE:Paired Count Data

    Posted 10-26-2011 10:36

    Thanks Michael for your suggestion but I am interested in looking at the differences and not the effect.

    From reading through articles, I am thinking of calculating the following test statistic then a p-value:

    Test Statistic= ?pre- ?post /Var(?pre- ?post)

    Where Var(?pre- ?post)=V(?pre) + V(?post) - 2 Cov(?pre, ?post)

    Since  ?pre and ?post are not independent, we will need to estimate the covariance as follows:

    Cov (?pre ,?post)=E[?pre ].E[?post] - E[?pre .?post]

    Now we can get a p-value based on the above test statistic.  Thoughts anyone??



    -------------------------------------------
    Shahidul Islam, MPH
    Biostatistician
    Winthrop University Hospital
    Mineola, NY 11590
    -------------------------------------------








  • 8.  RE:Paired Count Data

    Posted 10-26-2011 10:53

    Where does the Poisson assumption enter iinto your proposal?  Are you using it to derive the null distribution for your test statistic?  By the way if you are intending for the test statistic to be a normalized difference in counts you should be dividing by the square root of the variance and not the variance.

    What makes looking at the paired difference in counts (or change from baseline) so much different than looking at the effect of baseline on post surgery counts?  After all a statistically significant slope implies that the mean difference is statistically significantly different from zero.  Also if the coefficient is statistically significant and the model predicts a higher count post-surgery that is implying an average difference greater than 0 and if it predicts a lower count then it implies the average difference is less than zero.  I think it tells you what you want to know without getting into the messiness of considering the difference between two dependent Poisson random variables under the null and alternative hypotheses.
    -------------------------------------------
    Michael Chernick
    Director of Biostatistical Services
    Lankenau Institute for Medical Research
    -------------------------------------------








  • 9.  RE:Paired Count Data

    Posted 10-26-2011 11:15
    Sorry for the typo..Indeed I was intending for the test statistic to be a normalized difference in counts..It is not so different to me but I think it is easier to understand for the client I am providing service for...who understands paired ttest...:)

    Thanks again for all your help!

    -------------------------------------------
    Shahidul Islam, MPH
    Biostatistician
    Winthrop University Hospital
    Mineola, NY 11590
    -------------------------------------------








  • 10.  RE:Paired Count Data

    Posted 10-26-2011 11:35
    Agreed, there are many reasons to put this into a modelling and estimation framework, informative missingness not being the least of these - especially in the context of a surgical application. (There were some previous discussions on this topic on this board). Shahidul, another good text to check out is the "Multilevel and Longitudinal Modelling using Stata" text by Rabe-Hesketh and Skrondal. Easy read, good code examples, fairly inexpensive:
    http://www.stata-press.com/books/mlmus.html

    Hope this helps!

    Best!

    Mike
    -------------------------------------------
    Michael Griswold
    Executive Director
    Univ MS Medical Center Biostatistics
    -------------------------------------------








  • 11.  RE:Paired Count Data

    Posted 10-26-2011 11:16
    You can compute the SD for the differences directly and there's no need to estimate the covariance.
    With MLEs in the numerator, this approximation will work but I would consider the other approaches if Poisson assumption is correct.
    David

    -------------------------------------------
    David Bristol
    Statistical Consulting Services
    -------------------------------------------








  • 12.  RE:Paired Count Data

    Posted 10-26-2011 11:39

    Of course David is right.  If you compute all the paired differences and calculate the sample variance (dividing by n-1) you will have a direct unbiased estimate of the variance of the paired difference.  Also even though you can estimate the individual variances and the covariances for the pairs I am sure that estimate has more vairiability to it compared to the direct approach.

    Now if you are doing the paired difference for convenience to your client are you including or throwing out the Poisson assumption.  I am not sure how to get the distribution for the test statistic.  Are you actually separately testing each paired difference?  In which case you have a multiplicity issue.  If you are doing one test on the mean paired difference then of course you can use a standard normal approximation for the test (assuming you have an adequate number of patients for the normal approximation to be good.

    -------------------------------------------
    Michael Chernick
    Director of Biostatistical Services
    Lankenau Institute for Medical Research
    -------------------------------------------








  • 13.  RE:Paired Count Data

    Posted 10-26-2011 11:55

    Shahidul,


    This is some old fashioned statistics, but, for matched pairs, the variance of the differences between the pairs is estimated by finding the sample variance of the differences. The test statistic is then the mean of the differences divided by the square root of the quantity (the sample variance of the differences divided by the sample size). If you can consider the sample as being a random sample from a very large population, and if the sample size is large enough, the test statistic is distributed approximately normal with mean = 0 and variance = 1. If the population from which the sample is drawn is small relative to the sample size, the sample variance should be adjusted by the finite correction factor. The variance to use in the test statisitic is then the sample variance multipied by the quantity ( 1/n - 1/N), where n is the sample size and N is the population size.


    Hope this is helpful.


    Margot



    -------------------------------------------
    Margot Tollefson
    Owner
    Vanward Statistical Consulting
    -------------------------------------------








  • 14.  RE:Paired Count Data

    Posted 10-26-2011 14:21

     
    Couldn't you use PROC GLIMMIX in SAS? 
     
    Set the data up in a longitudinal form, i.e. two observations per person, one with the the pre measurement and one with the post measurement and a time indicator (1 for pre, 2 for post) and then run code similar to the following:
     
    proc glimmix data=long;
    model score = time / solution dist=poisson link=log;
    random intercept_ / id=subject;
    run;
    quit;
     
    If you do this with "normally" distributed data and a type=cs correlation structure you have the traditional paired t-test.  So, Shahidul gets his "paired t-test" that the investigators are familiar with as well as the Poisson distribution he is after.
     
    I may be totally off base and I could easily be missing the point, so take anything I say with a grain of salt.

    -------------------------------------------
    Leroy Thacker
    Assistant Professor
    Virginia Commonwealth University
    -------------------------------------------








  • 15.  RE:Paired Count Data

    Posted 10-26-2011 14:42
    I think that is exactly what Colleen was saying except that she suggested PROC GENMOD.  I think both procedures will give the same model.  For some reason he does want to predict final count given pre surgery count apparently because he thinks it will confuse the client.  But I agree with you and some others that the generalized linear mixed model is the way to go.

    -------------------------------------------
    Michael Chernick
    Director of Biostatistical Services
    Lankenau Institute for Medical Research
    -------------------------------------------








  • 16.  RE:Paired Count Data

    Posted 10-26-2011 14:48
    First of all let me say thank you again and mention that my membership fee for ASA is totaly worth it. I will meet with my client again to explain this generalized linear mixed model (either I use glimmix or genmod that should not be his main concern). I am convinced that this is the way to go.

    So thanks to all who contributed to this thread again and again.

    Shah

    -------------------------------------------
    Shahidul Islam, MPH
    Biostatistician
    Winthrop University Hospital
    Mineola, NY 11590
    -------------------------------------------








  • 17.  RE:Paired Count Data

    Posted 10-26-2011 14:57

    Generalized linear mixed effects models (e.g. GLIMMIX) is an alternative to the GEE (or GENMOD) approach; however the interpretation of the regression parameters is different depending on whether you specify random effects (G-side) or a covariance pattern (R-Side) to account for the correlated data. Verbeke and Molenberghs talk about this in their book on Models for Discrete Longitudinal Data \; as well, there is information in the text by Fitzmaurice, Laird and Ware.
    -------------------------------------------
    Jessica Ketchum
    Assistant Professor
    Virginia Commonwelath University
    -------------------------------------------








  • 18.  RE:Paired Count Data

    Posted 10-26-2011 16:08

    It appears that you are trying to compare two
    Poisson variables for each person.  The difference
    between two Poissons is a Skellam distribution.
    For info on analysis in a medical context, see

    Karlis D. and Ntzoufras I. (2006). Bayesian analysis of the differences of count data. Statistics in Medicine, 25, 1885-1905.  http://onlinelibrary.wiley.com/doi/10.1002/sim.2382/pdf
     
    (and for general info, see Wikipedia, Skellam distribution)


    -------------------------------------------
    David Rindskopf
    CUNY Graduate Center
    -------------------------------------------








  • 19.  RE:Paired Count Data

    Posted 10-26-2011 16:18

    I think that is interesting to know.  I never heard of that distribution before.  But I think the original plan was to test based on a normalized difference.  So it is the difference between two Poisson's divided by their estimated standard deviation.  Although it looks like he is doing the comparison for individual patients I think he might really be looking at the average of the count differences.  It appears though that he is going to opt for the generlized mixed model approach, however.
    -------------------------------------------
    Michael Chernick
    Director of Biostatistical Services
    Lankenau Institute for Medical Research
    -------------------------------------------








  • 20.  RE:Paired Count Data

    Posted 10-26-2011 14:38
    For purposes of comparing the pre-measure with each of the post-measures, Proc Genmod does support both Contrast statements and Estimate statements.  In theory, these should work just like the corresponding Contrast and Estimate statements do in Proc GLM, although in practice, one might have to put a "param=GLM" option up in the Class statement to make the analogy exact.

    Interestingly, the SAS documentation for Proc Genmod indicates that the Contrast and Estimate statements have a special form for Zero-Inflated Poisson models.  I don't know if inflated Zero counts are a concern in this blood-glucose-monitoring scenario, but I think that's an interesting capability to have.   


    -------------------------------------------
    Eric Siegel
    Biostatistician
    Univ of Arkansas for Medical Sciences
    -------------------------------------------





  • 21.  RE:Paired Count Data

    Posted 10-25-2011 17:27
    How about something simple like a Wilcoxon test? 

    -------------------------------------------
    Stephan Arndt
    Professor
    University of Iowa, Iowa Consortium
    -------------------------------------------








  • 22.  RE:Paired Count Data

    Posted 10-25-2011 17:53
    Even though I am not convinced that I understand the problem very well yet, I think I can answer your question.  He has a parametric count model in mind and if the Poisson assumption is believeable a parametric test should be preferred to a nonparametric one.  Secondly, there is the pairing.  So when you say Wilcoxon it should be the signed rank test and not the rank sum test.  I do get the sense that the pairing is an important part of the problem.

    But if the Poisson assumption is shakey and other count models are also not clearly appropriate then a signed rank test or a simple sign test may be the right thing to do.  But that would be if there is no patient effect.  The nonparametric tests could be used to find a "treatment " effect (difference between "pre" and "post") if there is no patient to patient effect.  But there needs to be more introduced into the model if there is patient to patient variability that is not simply removed by taking the difference between pre and post counts.

    -------------------------------------------
    Michael Chernick
    Director of Biostatistical Services
    Lankenau Institute for Medical Research
    -------------------------------------------