Discussion: View Thread

  • 1.  Relating count data to continuous.

    Posted 04-13-2012 14:33
    This message has been cross posted to the following eGroups: Statistical Consulting Section and Business and Economic Statistics Section .
    -------------------------------------------
    Hi all, I've just jumped into my first foray with a problem dealing with understanding the relationship between the time it takes to resolve a tech support issue and the number of time issues are passed off to different nodes in a queue (i.e. so how many times the issue was 'touched' before it was solved - whether due to misdiagnosis of the problem leading to misrouting, etc.). Obviously, one might presuppose that the more touches occur within a queue, a corresponding increase in time-to-resolution would be seen, but the nature of the relationship is not known. In talking with the workgroup who is responsible for analyzing this and recommending any processes for improvement, it appeared there were no descriptives (i.e. even a simple queuing map which might lend some basic insights into bottlenecks, etc), to which i recommended they begin there. I have read some of the literature on dealing with count data (a la Cameron and Trivedi, whose book im ordering on the subject) intends the count data to be the dependent variable. What would be a good lay way to explain the meaning of this? Is it that we can attribute different durations of time to resolve to different numbers of 'touches' (if the results demonstrated this of course)? My next question is the method for understanding this relationship. Plenty of literature refers to poisson regression. i was thinking that an anova of the output would give us an idea of the significance of the model (with AIC for comparison against further iterations). The literature also explains the use of an 'exposure' or 'offset' parameter to account for varying likelihoods of different levels of events (i.e. 2 touches or pass offs of an issue to another node in the queue is more likely to create a third pass off [perhaps explainable by the relative 'complexity' of the issue]). How can such an offset be calculated and integrated within another iteration of a model? Finally, once the model has been decided upon and exposure/ offsets accounted for, what strategy might you employ to integrate this information within the 'queuing map' (i guess more of a network diagram) so that candidate areas of process improvement can be more robustly visualized and interpreted? Or could all of what i laid out above be completely off track? Many thanks everyone! ------------------------------------------- Phillip Middleton Analyst / Student Rackspace / University of Texas At San Antonio -------------------------------------------


  • 2.  RE:Relating count data to continuous.

    Posted 04-16-2012 18:42

    Hi all, 

    When I wrote this msg, I did so from an iphone. Funny how things can become one run-on sentence that way. Lesson learned. 

    Just to synopsize for any takers - would anyone have a firm idea on:
    1. a sound way to determine correlation of count data with continuous data (i.e. number of times an issue is handled against the duration it takes to resolve the issue).
    2. if the count data *must* be considered (a la Cameron/Trivedi) the dependent variable, what would be an easy-to-understand way of presenting this to a decision-maker (since they are expecting info on just the reverse relationship)?
    3. in order to understand the nature of the relationship between the count and continuous variables, we were to build a model on this (say it's Poisson regression), an 'exposure' or an 'offset' to deal with time-dependent effects has been mentioned in the lit. (Example, likelihood an issue that is handled 3 times is passed on to a 4th or even a 5th person to handle it versus likelihood of an issue that has been handled once being passed to a 2nd individual). I am unsure of a good way to assess such dependencies, but believe they should be accounted for.

    Any thoughts would greatly be appreciated. 

    Phillip








  • 3.  RE:Relating count data to continuous.

    Posted 04-16-2012 19:47
    Hi, Phillip,

    I have to think about #2 and #3.  But as for #1, there is certainly nothing wrong in getting started with a scatterplot in conjunction with a Spearman correlation.  That would help inform subsequent directions to take.   

    Also, I'm curious about the continuous half of the data: Do you have only the duration it takes to resolve the issue?  Or do you also happen to have the individual components of duration between each successive handling of the issue?

    Finally, I don't know who Cameron and Trivedi are.  Are they the econometrics folks who wrote this interesting preprint at the following URL?: http://cameron.econ.ucdavis.edu/research/CTE01preprint.pdf


    -------------------------------------------
    Eric Siegel
    Biostatistician
    Univ of Arkansas for Medical Sciences
    -------------------------------------------





  • 4.  RE:Relating count data to continuous.

    Posted 04-16-2012 19:57

    Cameron and Trivedi are the authors of a book on count data published by Oxford University Press. 
    -------------------------------------------
    Michael Chernick
    Director of Biostatistical Services
    Lankenau Institute for Medical Research
    -------------------------------------------








  • 5.  RE:Relating count data to continuous.

    Posted 04-16-2012 20:11
    Hi, Phillip, I have a question.  Suppose that an issue has to be handled five times before it is resolved.  Are we allowed to infer that the first four handlings were unsuccessful? and only the fifth handling was successful?  Or can the same issue be handled succesfully more than once, but the issue recurs?

    -------------------------------------------
    Eric Siegel
    Biostatistician
    Univ of Arkansas for Medical Sciences
    -------------------------------------------





  • 6.  RE:Relating count data to continuous.

    Posted 04-17-2012 17:19
    Correct, this is the assumption, though it may indeed be true that multiple issues may be embedded within one ticket, requiring multiple passes to resolve (and technically this should not be happening, a separate event should be generated for each separate issue.). That will be another analysis entirely, but for now, success is assumed to occur at the last handler. To answer your first comment, the workgroup I'm helping out appears to have done a correlation analysis ( im assuming just a basic 2 factor regression) and found something around 0.45. I asked if they tested smething other than linear (ie log correlation) to at least give direction regarding whether the relationship was at least non-linear. i havent heard back from them on that. And yes, the paper you referred to was by same author. However, their approach in the mandate of treating count variables as dependent baffles me. Further, im not sure how i would create a lay explanation of results from say a Poisson regression model built in this way. Their mention of accounting for dependencies between the number of handles as i mentioned (the 'offset') is something else i havent seen discussed in any detail, but then i dont have the book which Michael referred to just yet which may discuss this in more detail. ------------------------------------------- Phillip Middleton Senior Statistician/Student Rackspace / University of Texas At San Antonio -------------------------------------------


  • 7.  RE:Relating count data to continuous.

    Posted 04-17-2012 17:29

    I was not interested in the details of your problem.  So I have not read your commentary in any detail.  I do have some interest in count data because a colleague that I collaborated with on a few papers did her dissertation on time series models for count data.  The INAR models that she dealt with handle time dependencies in count data.  I am by no means any kind of an expert in count data.  I do have the Trivedi book. So if you have questions that you want me to look for answers in the book I can do that for you while you wait to get your copy.
    -------------------------------------------
    Michael Chernick
    Director of Biostatistical Services
    Lankenau Institute for Medical Research
    -------------------------------------------








  • 8.  RE:Relating count data to continuous.

    Posted 04-17-2012 18:33

    I'm curious why the decision makers think the relationship is the "reverse" vs. Cameron/Trivedi as you say in your note.

    And to economists, if its "both" (in other words, x can be said to influence y and y can be said to influence x), then that's important and they refer to it as "endogeneity" and there are very elegant econometric models  for making estimates (all part of the SAS econometric package, and I'd presume in R).

    For  a presentation, a good graph of the variables would be helpful. 


    -------------------------------------------
    Chris Barker, Ph.D.
    President - San Francisco Bay Area Chapter of the American Statistical Association
    www,barkerstats.com

    ---
    "In composition you have all the time you want to decide what to say in 15 seconds, in improvisation you have 15 seconds."
    -Steve Lacy
    -------------------------------------------








  • 9.  RE:Relating count data to continuous.

    Posted 04-17-2012 21:51
    I want to comment on the topic without discussing the specifics because I have not followed the thread carefully.  I have the Cameron/Trivedi book.  It is a comprehensive book with the title "Regression Analysis of count data."

    Inspite of the title it does have a time series chapter which includes the INARMA models of McKenzie and Al-Osh and Alzaid.  This followed the methodology first presented by Lewis and Jacobs with the EARMA model that had exponential margin distirbutions.  As a graduate student at Stanford i contributed a little to this literature by developing the UAR(1) model which is first order autorgessive with uniform marginal distribtuions.  I also prove asymptotic extreme value results for EARMA(1,1) and the UAR(1) as most of my PhD dissertation.  I also have a very funny story about meeting Peter Lewis as a graduate student that I will be happy to share if you guys are interested.

    Regarding Phillip's comment about being puzzled about dependent counts I don't understand why he is puzzled.  Poisson regression and negtive binomial regression are both models where a count response depends on covariates that creates dependence through different count observations.

    -------------------------------------------
    Michael Chernick
    Director of Biostatistical Services
    Lankenau Institute for Medical Research
    -------------------------------------------








  • 10.  RE:Relating count data to continuous.

    Posted 04-18-2012 16:53
    Hi Michael, Thanks a ton for you and Emil's responses to my questions. Im not puzzled about the fact that certain counts of events are dependent upon reaching a threshold of counts of events leadin to the nth value. However my assumption is that the likelihood of x number of events creating a yth event is distributed unevenly among the levels of counts. My though is that C and T's mention of an offset may have assumed a monotonic change in likelihood for 1 any one count event to cause another one. Cameron and Trivedi mentioned using some sort of 'offset' or 'exposure' parameter, but i dont know if it assumes something like a monotonic decrease in likelihood of future events from past ones, or if this can be applied in some other way. This isnt something i have learned yet in my studies, but frankly it sounds like an actuarial term. What is this offset parameter. How is it calculated? ------------------------------------------- Phillip Middleton -------------------------------------------