Data Expo

Featured in Amstat News July 2019!

GSS is excited to host an annual Data Challenge Expo to be jointly sponsored by three ASA Sections – Statistical Computing, Statistical Graphics, and Government Statistics. The Data Challenge Expo is open to anyone who is interested in participating. This includes government, industry, academia, retirees, and students. This contest will challenge participants to analyze a data set using statistical and visualization tools and methods. There will be two award categories – Professional (one level with a $500 award) and Student (three levels with awards at $1,500, $1,000, and $500). Results of the analysis would be presented at the annual JSM in a Speed poster format. 2020 Deadline: Feb 4th 2020

Feedback from previous competitions:

-Thank you SO MUCH for organizing this competition. It is such a wonderful opportunity for our students to engage with relevant data and meaningful problems.
-The students who attended JSM last year for the competition had a wonderful time and learned a great deal about the wonderful, comprehensive world of statistics


Are you wanting to present something at JSM 2020 in Philadelphia, but you do not have a research project in mind? You might consider participating in the Data Challenge Expo 2020. Three ASA sections (Computing, Government, and Graphics) are proud to sponsor the annual Data Challenge Expo at the JSM 2020 meetings. The contest is open to anyone who is interested in participating, including college students and professionals from the private or public sector.

Contestants will present their results in a speed poster session ( at the JSM and must submit their abstracts to the JSM online system. Note that judging takes place at the JSM and is based on the results presented there. Presenters are responsible for their own JSM registration and travel costs, and any other costs associated with JSM attendance. Group submissions are acceptable. To enter, contestants must do the following by February 4, 2020.

  • Submit abstract for Speed Poster session to the JSM 2020 website ( ). Specify the Statistical Computing Section as the main sponsor. You may include the Government Statistics Section and the Statistical Graphics Section as additional sponsors. Abstract submission starts December 3, 2019.
  • Forward the JSM abstract submission email to Wendy Martinez (

The data set for the Data Challenge Expo 2020 will be the Global Historical Climatology Network (GHCN). Public use data files and documentation are available here: Contestants must use some portion of the GHCN data, but are strongly encouraged to combine other data sources in the analysis. Other sources of data contestants might utilize are IPUMS (, NASA’s EarthData (, or the European Data Portal ( 

Here are some questions to think about for an analysis. These are only to suggestions to get the ideas flowing. Contestants are encouraged to be creative.

  • Is there a long-term trend with respect to temperature? Are there any outliers or anomalies in space or time?
  • Is there a spatial pattern with respect to temperature changes?
  • Are there different geographic regions/clusters that behave differently, e.g., increases, no increases at all, or decreases?
  • Can you construct a spatio-temporal model that predicts temperatures in 2030, i.e., some slight extrapolation? What else might impact the temperatures in 10 years from now?

If you have any questions on the Data Challenge Expo please reach out to Wendy Martinez (


2019 Full Details and Data Sets

The data set for the 2019 data challenge is the New York City Housing and Vacancy Survey (NYCHVS). The NYCHVS is a representative survey of the New York City housing stock and population sponsored by the New York City Department of Housing Preservation and Development (HPD). It is the longest running housing survey in the country and is statutorily required. The Census Bureau has conducted the survey for the City since 1965. HPD is the only non-federal agency that sponsors a Census product. 

The HVS is a triennial survey with data collected about every three years. Each decade, a representative sample of housing units is selected, which represents the core sample. Field representatives collect information about each sampled unit, including those that are vacant as well as those that are occupied. For occupied units, an interview is conducted that gathers information about the reference person, any each additional member of the household, the household overall, and the household unit and building. In each survey cycle, the HVS gathers information about the core sample of housing units as well as an updated set of additional units that are sampled for each cycle to ensure that a given year’s data are representative of the citywide housing stock. Linked interviews within a decade are available for the 1990s and 2000s. The current decade of data (2011, 2014, and 2017) are not able to be linked into a longitudinal file due to disclosure avoidance protections.


For more information on the NYCHVS, including the 2017 NYCHVS questionnaire, sample design, and additional microdata files, visit the U.S. Census Bureau.

Research Questions

For this year’s data challenge, we invite contestants to consider the following the research questions. Contestants are also welcome to submit admissions based on their own research questions.

1. Since the 1970s, housing quality has improved dramatically; however, some sectors of the housing stock continue to face poor conditions and some specific maintenance deficiencies continue to show higher prevalence. Create a housing quality index for the NYCHVS that enables a view of the housing conditions faced by residents. Contestants may consider the relative importance of different conditions now and/or how the prevalence of these issues has shifted over time. 

2.  For the last 50 years, part of the NYC rental stock has been subject to price controls. Currently, about half of the City's rental stock fall under rent control or rent stabilization. Describe what the NYC rental market would be like if these price controls were lifted. What would the NYC rental market look like 10 years from now? Contestants may choose to look at a variety of factors such as quality, housing costs, or population. 

3. Like many US cities, New York is facing large changes in the housing market as prices continue to increase and long-term residents face the challenges of gentrification and displacement pressures. Create a measure of gentrification that shows how these conditions are affecting New York City residents. NYCHVS data are available at a PUMA level (called ‘sub-borough areas’). Contestants are welcome to add additional information to the dataset from other Census surveys or secondary data. 

4. New York City is a city of immigrants. The NYCHVS collects information on current residents, including the place of birth for the householder, and his/her mother and father. Describe changes in housing conditions for first and second generation immigrant householders in NYC. 

5. Two out of every three New York City households rent their home, which is the inverse of the US overall, where two out of every three households own their home. What are the costs and benefits of renting and owning in New York City? Is one a better option than the other? Consider the quality of housing, income/debt/benefits, age/stability, etc.


To facilitate ASA entries, the NYC Department of Housing Preservation and Development has provided microdata files from interviews with occupied households from the 1991 through 2017 NYCHVS cycles. These files have been prepared to provide consistent variable names over time. As such, they differ slightly from the files available at ASA data challenge submissions may use either these data files or any of the materials available on the NYCHVS homepage on 

For the 2019 Data Challenge Expo, the NYCHVS datasets are available here in Stata, SAS, and CSV formats for contestants. Datasets are available for each NYCHVS cycle from 1991 through 2017.

2019 Full Details and Data Sets
If you have any questions regarding using the NYCHVS for the 2019 ASA Data Challenge Expo, please contact the Survey Director, Elyzabeth Gaumer, at Good luck and we look forward to learning from your entry!

2019 Data Expo Winners

Student Educational Category

1st place: Xiang Shen, George Washington University.

2nd place: Ben Schweitzer, Miami University.

3rd place (tie): Alison Tuiyott, Miami. University

3rd place (tie): Jacob Gertszten and Damian Chambon, University of Virginia

Professional Category
Quentin Brummet
, Creating Tract-Level Housing Characteristics in the NYCHVS, NORC  

WinnerWWendy DataExpoStudent2wWendy

  Professional Category
Ed Mulrow, NORC

Educational Category:
Jacob Gerszten of UVA


GSS is excited to announce a special issue of the journal Computational Statistics, which will include papers by the Data Expo candidates.

All winners will be invited to submit a paper to the journal. Other invitations will be based on your JSM 2018 proceedings papers. Therefore, non-winners of the Data Expo must submit a paper to the JSM 2018 proceedings to be considered for the special issue. Winners are also encouraged to submit a proceedings paper, but are not required to do so.

Data Expo Winners  - Educational Category

First Place: Queen Ikhelowa and Darren Keeley, California State University, East Bay, “Modeling and Mapping Weather Forecast Accuracy”

Second Place: Jill Lundell, Brennan Bean, Utah State University, “Let’s Talk About the Weather”

Second Place: Benjamin William Schweitzer, Nichole Rook, Ryan Estep, Robert Garrett, Miami University, “An analysis on the Accuracy of Weather Forecasts”
Thomas, Wendy, and Robert

Data Expo Winners  - Professional Category

Dooti Roy, Gregory VaughanAlong with Jianan Hui, Junxian Geng, Boehringer Ingelheim Pharmaceuticals, “Should You Pay Attention to Daily Weather Forecasts? An Exploration”