Discussion: View Thread

Descriptive Stat Question

  • 1.  Descriptive Stat Question

    Posted 08-06-2012 14:52

    I hope you can help me with a "descriptive statistics" question.  A client has several sites (say, 26) where classes are being taught (one class per site).  The registration varies by site, from 2 or 3 to several hundred.  Registrants may

    • Withdraw before the first day of class (W)
    • Drop the class (D)
    • Take an Incomplete for the class (I)
    • Pass the class (P)
    • Fail the class (F)

    For a given site, these 5 categories are all possible outcomes for the registrants.  We want to characterize the responses for a given outcome across all sites (for example, we want to identify the sites with the largest % of Withdrawals for intervention and follow-up), but we want to "weight" this somehow by volume (if one site has 50% Withdrawals but 2 total registrants, it is much less important than a site with 25% Withdrawals but 200 total registrants).  The raw count data would appear as below:

     

    Site

    W

    D

    I

    P

    F

    Total

    A

    aw

    ad

    ai

    ap

    af

    At

    B

    bw

    bd

    bi

    bp

    bf

    Bt

    C

    cw

    cd

    ci

    cp

    cf

    Ct

    ...

    ...

    ...

    ...

    ...

    ...

    ...

    Y

    yw

    yd

    yi

    yp

    yf

    Yt

    Z

    zw

    zd

    zi

    zp

    zf

    Zt

    Total

    Wt

    Dt

    It

    Pt

    Ft

    Grand

     

    The volume indicator (relative size of the sites) for site A is At/Grand.  The question is, what should the measure of the outcome be, and why.  For example, when considering Withdrawals, should the % for site A be aw/At (% of total site registrants), or aw/Wt (% of total Withdrawals)?  I am fairly certain that it should be the % of total site registrants, but some team members enthusiastically feel that it should be the % of total Withdrawals, and I am having difficulty defending my position.

     

    Thanks for your help with this mundane question!

    -------------------------------------------
    James Gear
    Senior Statistician
    -------------------------------------------



  • 2.  RE:Descriptive Stat Question

    Posted 08-06-2012 15:02
    If the goal is to save or help the greatest number of students, I would focus first on the locations with the largest number of students that need help and, perhaps, that are in categories that might most easily respond to help  That logic would argue for using absolute numbers rather than percentages,

    -------------------------------------------
    Joel Wiesen
    Director
    Applied Personnel Research
    -------------------------------------------








  • 3.  RE:Descriptive Stat Question

    Posted 08-06-2012 15:02
    If descriptive statistics can include a picture, this sounds like a job for a mosaic plot. 
    That way, you can simulataneously see where the withdrawals are happening in absolute
    terms, as well as how big a contribution withdrawals are to a given school.  With
    appropriate color coding, you can get a feel at a glance for how "withdrawally" a particular school is.

    >>Kathy

    -------------------------------------------
    Katherine Godfrey
    -------------------------------------------








  • 4.  RE:Descriptive Stat Question

    Posted 08-06-2012 15:08
    What the best denominator should be is, strictly speaking, not a statistical question. But if you think about it, using Aw/AT is more intuitive.

    It is a measure that does not go down when other bad things (dropouts, incompletes) go up.

    Also, dividing everything by AT insures that your percentages add up to 100.

    You should consider a random effects model, by the way. A shrinkage estimate will downweight extreme values from small classes.

    -------------------------------------------
    Stephen Simon
    Independent Statistical Consultant
    P. Mean Consulting
    -------------------------------------------








  • 5.  RE:Descriptive Stat Question

    Posted 08-06-2012 15:09
    Perhaps a mundane answer to a mundane question will help.
    As is implied in the presentation of the question, it depends on your objective(s).
    If the concern is maintaining current programs, i side with percent of grand total.
    Ranking sites by total registrations might be a useful adjunct with percent withdrawals at each site.  Low enrollments and high withdrawals suggest problems.  Either the offerings are wrong for a site or there are site specific quality issues.  for now, that's all I can think of.  In a nut shell, different objectives need different organization of the data.  A comprehensive use of the data suggests a list of objectives and prioritizing them.  Obviously the data may not be adequate for multiple needs.

    -------------------------------------------
    James Weber
    -------------------------------------------








  • 6.  RE:Descriptive Stat Question

    Posted 08-06-2012 15:32
    Here is another possibly mundane suggestion... Did you think about weighing the percentages aw/At on the basis of Σt=At+...+Zt (=? Grand) ?
    So if you take Attaw/At , then you weigh your target percentages aw/At by the relative registrant volume Att at the site. This could be one way to account for the "importance" of registrants at a site. It also enables you to use as target percentage the attribute that best fits your study needs, be it aw/At or aw/Wt.

    -------------------------------------------
    Alexander Kolovos
    Research Developer Scientist
    SpaceTimeWorks, LLC
    -------------------------------------------





  • 7.  RE:Descriptive Stat Question

    Posted 08-06-2012 15:44

    As a former data manager for an educational research company, I lean towards the very practical.  You need the percents of the totals (overall percents), the percents of each condition (the column percents) and the percents for each site (the row percents).  Create all three simultaneously using a program like SAS, then export the tables into Excel and create the three different versions, clearly labeling the three meanings, You'll likely find that some users will use one table, and others another. The other responders to this thread have clearly indicated what you already know - the table to be used depends on the research question you're answering.
    -------------------------------------------
    Catherine Trapani
    Fordham University, Psychometrics Program
    -------------------------------------------








  • 8.  RE:Descriptive Stat Question

    Posted 08-06-2012 16:53

    Just a suggestion.  This kind of question used to come up in problem solving teams in industry and this is one of the ways we dealt with it.   Presumably your team members are trying to address the limited resources problem and the issues of fairness to individual sites selected for "treatment" in choosing their metric.

    The proportion  aw/At  when compared to  bw/Bt  answers the question;

          "Which site has the greater relative withdrawal rate, based on site population?"

    Asking this question assumes there is some acceptable withdrawal rate and focusing resources based on the relative withdrawal rate alone will tend to equalize performance across sites, but may not maximize the reduction in total withdrawal rate system wide.  If the objective is only more uniform site to site performance, this is the better measure.

     

    The proportion  aw/Wt  when compared to  bw/Wt  answers the question;

          "Which site has the greater absolute withdrawal rate, based on total system population?"

    Asking this question assumes that some sites are contributing disproportionately to the total system-wide withdrawal performance and will identify the biggest contributors.  However, it may identify as big absolute contributors sites which have acceptable relative withdrawal performance given their site population, and they may perceive the unwanted attention as "unfair".  Focusing only on the biggest site-wide contributors will maximize the reduction in total system withdrawal rate, but at the risk of making some sites feel they were singled out "unfairly".

    By computing both proportions for each site, and the average of those proportions across all sites, then those sites which have both high relative withdrawal rate and high absolute withdrawal rate can be identified.  Those sites could presumably be focused on without the perception of "unfairness" being raised relative to some site to site performance norm.  Addressing those sites would subsequently reduce the absolute system-wide withdrawal rate and to a lesser extent reduce the average proportion of withdrawals across all sites.


    Tom

    Thomas D. Sandry
    Retired Industrial Consultant


    -------------------------------------------
    Thomas Sandry
    -------------------------------------------








  • 9.  RE:Descriptive Stat Question

    Posted 08-06-2012 18:23
    I neglected to mention (the obvious) that after computing both the relative withdrawal rate and the absolute withdrawal rate for each site, a convenient way to present them to the Team is in a simple scatterplot.  The target sites for attention will be in the upper right hand corner of the plot.  These points can be labelled by site name and the Team can make a decision with relatively little debate.   The overall averages can be used to divide the plotting area into four quadrants which sometimes helps resolve disputes. 

    Tom

    Thomas D. Sandry, PhD
    Retired Industrial Consultant

    -------------------------------------------
    Thomas Sandry
    -------------------------------------------








  • 10.  RE:Descriptive Stat Question

    Posted 08-06-2012 15:46
    Do you simply want the largest % withdrawals or do you want to find those sites that have "relatively" large withdrawal percent given the marginal counts?  If so, then wouldn't the "chi-squared" residuals by considering the table as a contingency table provide that information? 

    -------------------------------------------
    Robert Podolsky
    Georgia Health Sciences University
    -------------------------------------------








  • 11.  RE:Descriptive Stat Question

    Posted 08-06-2012 16:28
    One simple thought is to be the matrix into a contingency table and you will then have row, column, total percentages, "expected values" for deviation analysis etc etc.  John

    -------------------------------------------
    John Bartko
    Consulting Biostatistician
    -------------------------------------------








  • 12.  RE:Descriptive Stat Question

    Posted 08-06-2012 16:35
    When considering Withdrawals, maybe the thing to do is to get those enthusiastic team members to go up to the whiteboard and give their interpretation of what each fraction means.  Then everybody can consider which interpretation comes closer to addressing the research objective.  (By the way, what is the research objective?)   

    Also, you may have already pointed out the following to no avail, but if not, here goes:

    We already know that total registration can vary from site to site by as much as two orders of magnitude.  Suppose that the number of total registrants at a site has no effect on the withdrawal rate at that site.  Then E(aw/At) = E(bw/Bt) = E(cw/Ct) = ... = E(zw/Zt), whereas E(aw/Wt) = At/Grand, E(bw/Wt) = Bt/Grand, E(Cw/Wt) = Ct/Grand, and so on.  In other words, if the size of the site has no effect on the withdrawal rate, then the fractions aw/At, bw/Bt, cw/Ct, etc. will tend to reflect that fact successfully, whereas the fractions aw/Wt, bw/Wt, cw/Wt, etc. will instead tend to be proportional to site size...which, as we already know, can vary from site to site by as much as two orders of magnitude.   


    -------------------------------------------
    Eric Siegel
    Biostatistician
    Univ of Arkansas for Medical Sciences
    -------------------------------------------




  • 13.  RE:Descriptive Stat Question

    Posted 08-06-2012 17:42
    This situation reminds me of the structure of an absorbing Markov chain, each site representing an instance of the same chain structure.  A drawing and/or a matrix representation of the structure might be a useful way of displaying your results.

    -------------------------------------------
    Robert Sims
    Instructor of Statistics
    George Mason University
    -------------------------------------------








  • 14.  RE:Descriptive Stat Question

    Posted 08-06-2012 19:11

    All,

    Thanks so much for your input!  Every response was thoughtful and informative!  Several common 'themes' were represented in your responses:

     

    • The 'right' answer depends on the question to be answered (the "research objective").  Sometimes it is better to provide all of the alternatives with pros and cons, instead of insisting that one answer (mine) is right! J
    • Graphical tools are always more easily understood than a multiplicity of numbers and increased verbosity. (Translation: "A picture is worth a thousand words!" J)  I appreciate the suggestions for using the mosaic plot (I couldn't remember the name of that thing!) and the "quadrant-ized" scatter plot (which I did remember and use.)
    • Tom (Thomas Sandry), I think your example parallels our setting quite closely, and the idea of using the average of both proportions to represent more 'fairly' these rates will certainly be one of the alternatives that will be presented to the clients!
    • And Stephen (Stephen Simon), thanks for your last suggestion.  We are a ways from modeling, but I will remember to consider a random effects model.

     

    Again, thanks for your help!  It is great to know that the combined technical expertise and experience of the Statistical Consulting Section may potentially be accessed as a resource!

    -------------------------------------------
    James Gear
    Senior Statistician
    -------------------------------------------