Discussion: View Thread

Back to discussions

Expand all | Collapse all

Descriptive Stat Question

1. Descriptive Stat Question

Recommend

James Gear

Posted 08-06-2012 14:52

I hope you can help me with a "descriptive statistics" question. A client has several sites (say, 26) where classes are being taught (one class per site). The registration varies by site, from 2 or 3 to several hundred. Registrants may

Withdraw before the first day of class (W)
Drop the class (D)
Take an Incomplete for the class (I)
Pass the class (P)
Fail the class (F)

For a given site, these 5 categories are all possible outcomes for the registrants. We want to characterize the responses for a given outcome across all sites (for example, we want to identify the sites with the largest % of Withdrawals for intervention and follow-up), but we want to "weight" this somehow by volume (if one site has 50% Withdrawals but 2 total registrants, it is much less important than a site with 25% Withdrawals but 200 total registrants). The raw count data would appear as below:

Site	W	D	I	P	F	Total
A	a_w	a_d	a_i	a_p	a_f	A_t
B	b_w	b_d	b_i	b_p	b_f	B_t
C	c_w	c_d	c_i	c_p	c_f	C_t
...	...	...	...	...	...	...
Y	y_w	y_d	y_i	y_p	y_f	Y_t
Z	z_w	z_d	z_i	z_p	z_f	Z_t
Total	W_t	D_t	I_t	P_t	F_t	Grand

The volume indicator (relative size of the sites) for site A is A_t/Grand. The question is, what should the measure of the outcome be, and why. For example, when considering Withdrawals, should the % for site A be a_w/A_t (% of total site registrants), or a_w/W_t (% of total Withdrawals)? I am fairly certain that it should be the % of total site registrants, but some team members enthusiastically feel that it should be the % of total Withdrawals, and I am having difficulty defending my position.

Thanks for your help with this mundane question!

-------------------------------------------
James Gear
Senior Statistician
-------------------------------------------

2. RE:Descriptive Stat Question

Recommend
Joel Wiesen
Posted 08-06-2012 15:02
If the goal is to save or help the greatest number of students, I would focus first on the locations with the largest number of students that need help and, perhaps, that are in categories that might most easily respond to help That logic would argue for using absolute numbers rather than percentages,

-------------------------------------------
Joel Wiesen
Director
Applied Personnel Research
-------------------------------------------
3. RE:Descriptive Stat Question

Recommend
Katherine Godfrey
Posted 08-06-2012 15:02
If descriptive statistics can include a picture, this sounds like a job for a mosaic plot.
That way, you can simulataneously see where the withdrawals are happening in absolute
terms, as well as how big a contribution withdrawals are to a given school. With
appropriate color coding, you can get a feel at a glance for how "withdrawally" a particular school is.

>>Kathy

-------------------------------------------
Katherine Godfrey
-------------------------------------------
4. RE:Descriptive Stat Question

Recommend
Stephen Simon
Posted 08-06-2012 15:08
What the best denominator should be is, strictly speaking, not a statistical question. But if you think about it, using A_w/A_T is more intuitive.

It is a measure that does not go down when other bad things (dropouts, incompletes) go up.

Also, dividing everything by A_T insures that your percentages add up to 100.

You should consider a random effects model, by the way. A shrinkage estimate will downweight extreme values from small classes.

-------------------------------------------
Stephen Simon
Independent Statistical Consultant
P. Mean Consulting
-------------------------------------------
5. RE:Descriptive Stat Question

Recommend
James Weber
Posted 08-06-2012 15:09
Perhaps a mundane answer to a mundane question will help.
As is implied in the presentation of the question, it depends on your objective(s).
If the concern is maintaining current programs, i side with percent of grand total.
Ranking sites by total registrations might be a useful adjunct with percent withdrawals at each site. Low enrollments and high withdrawals suggest problems. Either the offerings are wrong for a site or there are site specific quality issues. for now, that's all I can think of. In a nut shell, different objectives need different organization of the data. A comprehensive use of the data suggests a list of objectives and prioritizing them. Obviously the data may not be adequate for multiple needs.

-------------------------------------------
James Weber
-------------------------------------------
6. RE:Descriptive Stat Question

Recommend
Alexander Kolovos
Posted 08-06-2012 15:32
Here is another possibly mundane suggestion... Did you think about weighing the percentages a_w/A_t on the basis of Σ_t=A_t+...+Z_t (=? Grand) ?
So if you take A_t/Σ_t * a_w/A_t , then you weigh your target percentages a_w/A_t by the relative registrant volume A_t/Σ_t at the site. This could be one way to account for the "importance" of registrants at a site. It also enables you to use as target percentage the attribute that best fits your study needs, be it a_w/A_t or a_w/W_t.

-------------------------------------------
Alexander Kolovos
Research Developer Scientist
SpaceTimeWorks, LLC
-------------------------------------------
7. RE:Descriptive Stat Question

Recommend
Catherine Trapani
Posted 08-06-2012 15:44
As a former data manager for an educational research company, I lean towards the very practical. You need the percents of the totals (overall percents), the percents of each condition (the column percents) and the percents for each site (the row percents). Create all three simultaneously using a program like SAS, then export the tables into Excel and create the three different versions, clearly labeling the three meanings, You'll likely find that some users will use one table, and others another. The other responders to this thread have clearly indicated what you already know - the table to be used depends on the research question you're answering.
-------------------------------------------
Catherine Trapani
Fordham University, Psychometrics Program
-------------------------------------------

Original Message
8. RE:Descriptive Stat Question

Recommend
Thomas Sandry
Posted 08-06-2012 16:53
Just a suggestion. This kind of question used to come up in problem solving teams in industry and this is one of the ways we dealt with it.   Presumably your team members are trying to address the limited resources problem and the issues of fairness to individual sites selected for "treatment" in choosing their metric.
The proportion a_w/A_t when compared to b_w/B_t answers the question;
      "Which site has the greater relative withdrawal rate, based on site population?"
Asking this question assumes there is some acceptable withdrawal rate and focusing resources based on the relative withdrawal rate alone will tend to equalize performance across sites, but may not maximize the reduction in total withdrawal rate system wide. If the objective is only more uniform site to site performance, this is the better measure.

The proportion a_w/W_t when compared to b_w/W_t answers the question;
      "Which site has the greater absolute withdrawal rate, based on total system population?"
Asking this question assumes that some sites are contributing disproportionately to the total system-wide withdrawal performance and will identify the biggest contributors. However, it may identify as big absolute contributors sites which have acceptable relative withdrawal performance given their site population, and they may perceive the unwanted attention as "unfair". Focusing only on the biggest site-wide contributors will maximize the reduction in total system withdrawal rate, but at the risk of making some sites feel they were singled out "unfairly".
By computing both proportions for each site, and the average of those proportions across all sites, then those sites which have both high relative withdrawal rate and high absolute withdrawal rate can be identified. Those sites could presumably be focused on without the perception of "unfairness" being raised relative to some site to site performance norm. Addressing those sites would subsequently reduce the absolute system-wide withdrawal rate and to a lesser extent reduce the average proportion of withdrawals across all sites.

Tom

Thomas D. Sandry
Retired Industrial Consultant

-------------------------------------------
Thomas Sandry
-------------------------------------------

Original Message
9. RE:Descriptive Stat Question

Recommend
Thomas Sandry
Posted 08-06-2012 18:23
I neglected to mention (the obvious) that after computing both the relative withdrawal rate and the absolute withdrawal rate for each site, a convenient way to present them to the Team is in a simple scatterplot. The target sites for attention will be in the upper right hand corner of the plot. These points can be labelled by site name and the Team can make a decision with relatively little debate. The overall averages can be used to divide the plotting area into four quadrants which sometimes helps resolve disputes.

Tom

Thomas D. Sandry, PhD
Retired Industrial Consultant

-------------------------------------------
Thomas Sandry
-------------------------------------------

Original Message
10. RE:Descriptive Stat Question

Recommend
Robert Podolsky
Posted 08-06-2012 15:46
Do you simply want the largest % withdrawals or do you want to find those sites that have "relatively" large withdrawal percent given the marginal counts? If so, then wouldn't the "chi-squared" residuals by considering the table as a contingency table provide that information?

-------------------------------------------
Robert Podolsky
Georgia Health Sciences University
-------------------------------------------
11. RE:Descriptive Stat Question

Recommend
John Bartko
Posted 08-06-2012 16:28
One simple thought is to be the matrix into a contingency table and you will then have row, column, total percentages, "expected values" for deviation analysis etc etc. John

-------------------------------------------
John Bartko
Consulting Biostatistician
-------------------------------------------
12. RE:Descriptive Stat Question

Recommend
Eric Siegel
Posted 08-06-2012 16:35
When considering Withdrawals, maybe the thing to do is to get those enthusiastic team members to go up to the whiteboard and give their interpretation of what each fraction means. Then everybody can consider which interpretation comes closer to addressing the research objective. (By the way, what is the research objective?)

Also, you may have already pointed out the following to no avail, but if not, here goes:

We already know that total registration can vary from site to site by as much as two orders of magnitude. Suppose that the number of total registrants at a site has no effect on the withdrawal rate at that site. Then E(a_w/A_t) = E(b_w/B_t) = E(c_w/C_t) = ... = E(z_w/Z_t), whereas E(a_w/W_t) = A_t/Grand, E(b_w/W_t) = B_t/Grand, E(C_w/W_t) = C_t/Grand, and so on. In other words, if the size of the site has no effect on the withdrawal rate, then the fractions a_w/A_t, b_w/B_t, c_w/C_t, etc. will tend to reflect that fact successfully, whereas the fractions a_w/W_t, b_w/W_t, c_w/W_t, etc. will instead tend to be proportional to site size...which, as we already know, can vary from site to site by as much as two orders of magnitude.

-------------------------------------------
Eric Siegel
Biostatistician
Univ of Arkansas for Medical Sciences
-------------------------------------------
13. RE:Descriptive Stat Question

Recommend
Robert Sims
Posted 08-06-2012 17:42
This situation reminds me of the structure of an absorbing Markov chain, each site representing an instance of the same chain structure. A drawing and/or a matrix representation of the structure might be a useful way of displaying your results.

-------------------------------------------
Robert Sims
Instructor of Statistics
George Mason University
-------------------------------------------
14. RE:Descriptive Stat Question

Recommend
James Gear
Posted 08-06-2012 19:11
All,

Thanks so much for your input! Every response was thoughtful and informative! Several common 'themes' were represented in your responses:

The 'right' answer depends on the question to be answered (the "research objective"). Sometimes it is better to provide all of the alternatives with pros and cons, instead of insisting that one answer (mine) is right! J

Graphical tools are always more easily understood than a multiplicity of numbers and increased verbosity. (Translation: "A picture is worth a thousand words!" J) I appreciate the suggestions for using the mosaic plot (I couldn't remember the name of that thing!) and the "quadrant-ized" scatter plot (which I did remember and use.)

Tom (Thomas Sandry), I think your example parallels our setting quite closely, and the idea of using the average of both proportions to represent more 'fairly' these rates will certainly be one of the alternatives that will be presented to the clients!

And Stephen (Stephen Simon), thanks for your last suggestion. We are a ways from modeling, but I will remember to consider a random effects model.

Again, thanks for your help! It is great to know that the combined technical expertise and experience of the Statistical Consulting Section may potentially be accessed as a resource!

-------------------------------------------
James Gear
Senior Statistician
-------------------------------------------

Discussion: View Thread

Descriptive Stat Question

James Gear08-06-2012 14:52

Joel Wiesen08-06-2012 15:02

Katherine Godfrey08-06-2012 15:02

Stephen Simon08-06-2012 15:08

James Weber08-06-2012 15:09

Alexander Kolovos08-06-2012 15:32

Catherine Trapani08-06-2012 15:44

Thomas Sandry08-06-2012 16:53

Thomas Sandry08-06-2012 18:23

Robert Podolsky08-06-2012 15:46

John Bartko08-06-2012 16:28

Eric Siegel08-06-2012 16:35

Robert Sims08-06-2012 17:42

James Gear08-06-2012 19:11

1. Descriptive Stat Question

2. RE:Descriptive Stat Question

3. RE:Descriptive Stat Question

4. RE:Descriptive Stat Question

5. RE:Descriptive Stat Question

6. RE:Descriptive Stat Question

7. RE:Descriptive Stat Question

8. RE:Descriptive Stat Question

9. RE:Descriptive Stat Question

10. RE:Descriptive Stat Question

11. RE:Descriptive Stat Question

12. RE:Descriptive Stat Question

13. RE:Descriptive Stat Question

14. RE:Descriptive Stat Question