Home

Welcome to Stats 101: A Resource for Teaching Introductory Statistics


 This community provides a toolkit for instructors of Introductory Statistics courses.

Overview

A Series of Case Studies

Resources for Statistics Teachers developed by:

Richard D. De Veaux, Williams College
Deborah Nolan and Jasjeet Sekhon, UC Berkeley
Nicholas Horton, Amherst College and Ben Baumer, Smith College
Daniel Kaplan, Macalester College
Julie Legler, St. Olaf College and Carrie Grimes, Google
with help from David Bock, Ithaca High School, retired
December 14, 2015

Introduction: Many teachers of introductory statistics courses, whether at the high school, 2 year or 4 year college or university level are trained in mathematics, with little or no training or experience with statistics. At the request of the 2015 President of the American Statistical Association, David Morganstein, we have written a series of case studies, designed to show statistics in action, rather than showing it as a branch of mathematics. Each case starts with a real world problem and leads the reader through the steps taken to explore the problem, highlighting the techniques used in introductory or AP statistics classes. Sometimes the analysis goes slightly past the methods taught in such an intro course, but the analysis is meant to build on simpler techniques and to provide examples of real analyses, typical of the kind of analysis a professional statistician might perform. Our hope is that these case studies can both provide context and motivation for the instructor so that the methods in the intro course come alive, rather than seem a list of cookbook formulas. They can be used as examples in class, or just as guides for what a statistical analysis might entail.

Each case is presented in 2 versions:

  • An R version, written in R Markdown, showing all the R code used to make the plots and the analysis. This version is available in the public library on this site.
  • A version using the package JMP from SAS. This version will be housed on the JMP User Community site.

 

Please share your feedback. Use this link to ask questions and share your comments about the case studies. Your feedback will help us improve!


Available Case Studies

How Much is a Fireplace Worth?

Author: Dick De Veaux, Williams College

Nearly 60% of the houses in Saratoga County New York have fireplaces. On average, those houses sell for about $65,000 more than houses without fireplaces. Is the fireplace the reason for the difference? This case study starts by the simple comparison of the prices with and without fireplaces. But, there are other characteristics of the houses with fireplace that may affect the price as well. The intent is to show the danger of using simple group comparisons to answer a question that involves many variables. The study then builds a series of more sophisticated models to show how adjustment by other variables can lead to a more sensible conclusion.

The data are a random sample of 1,728 homes taken from public records from the Saratoga County (http://www.saratogacountyny.gov/departments/real-property-tax-service-agency/) and collected by Candice Corvetti (Williams ’07) for her senior thesis.

How Much Does a Diamond Cost?

Author: Dick De Veaux, Williams College

Everyone who has thought about buying a diamond knows about the four C’s of diamond pricing: Carat (weight), Color, Cut and Clarity. What are the tradeoffs among these factors? Can we build a model to accurately predict the price of a diamond knowing just these characteristics? The object of the study is to produce and diagnose such a model and to assess its limitations.

The data are a sample of 2,690 diamonds taken from the site http://www.adiamor.com/ in 2010 by Lou Valente of JMP.

Keeping a Web Cache  Fresh

Authors: Carrie Grimes, Google and Deb Nolan, University of California, Berkeley

Internet searches such as those preformed by Google, Bing, and Ask, keep copies of Web pages so that when you make a query, they can quickly search their stored pages and return their findings to you. A saved page is called a Web cache. By using caches, instead of searching hundreds of thousands of sites, the search can be performed in real time. Of course, if the page has changed since the last time it was stored then the search engine serves stale pages and the results are either out of date, or just wrong. In order to keep the cache fresh, Web pages need to be visited regularly and the cache updated with any changed pages. How often do Web pages change? How often should the sites be visited to keep the cache fresh? This case study will consider these questions by creating models of page updating.

The data are a collection of the behavior of 1,000 Web pages. Each of these pages was visited every hour for 30 days. The page was compared to the previous visit, and if it had changed, the cache was updated and time of the visit was recorded.


 

 

Better Flight Experiences with Data (Airline Delays in New York City)

Authors: Nick Horton (Amherst College) and Ben Baumer (Smith College)

If you’ve ever taken a commercial airline, you know that delays are part of the adventure. Before booking your flight, can data help you decide what time of day, what time of year and what airline to choose in order to minimize your chance of a delay? The object of the study is to explore data collected by the US Bureau of Transportation Statistics (BTS) set in order to minimize the chances of experiencing a long delay. The data set is extremely large, so the study focuses on the delays from New York City in the year 2013.

The data are collected daily by BTS. (For a summary see http://www.transtats.bts.gov/ homedrillchart.asp). The data in this study are a subset of that data, collected by Hadley Wickham of RStudio.

Election 2000 – What Happened to Al Gore?

Authors: Deb Nolan and Jasjeet Sekhon, University of California at Berkeley

The 2000 election was extremely close with Al Gore receiving 50,999,897 votes to the 50,456,002 received by George W. Bush. However, the electoral college vote was 271 to 266 in favor of Bush, giving him the election. Those 271 electoral votes included 25 from the state of Florida where the vote was so close that a mandatory recount was preformed. The Supreme Court ended the recount on Dec. 12, 2000 awarding Florida’s votes to Bush and the election. The study explores the effect of the infamous “butterfly ballot” in Palm Beach County, and whether the voters there who, according to county records, voted for Pat Buchanan, actually wanted to vote for Al Gore.

The data are the votes, for each county in Florida cast for each of the candidates in the 2000 US presidential election.