ASA Connect

 View Only
  • 1.  Open sources of Data for Undergraduate Research Project in Statistics

    Posted 02-17-2015 12:04
    Dear ASA Members,

    I am looking open sources of data that can be used for Undergraduate Research Projects for my students. My students have limited knowledge on simulating data. So, I am looking open sources of data. If you have experiences in using such data for an Undergraduate research project, please let me know along with downloading links.  I would appreciate your ideas.

    Note: Students are Undergraduate (Major / minor  in Statistics)

    Sincerely, 

    -------------------------------------------
    Achut Adhikari
    Student
    -------------------------------------------


  • 2.  RE: Open sources of Data for Undergraduate Research Project in Statistics

    Posted 02-18-2015 07:19
    I've used the Add Health data in my Honors intro class. 

    -------------------------------------------
    Beverly Wood
    Assistant Professor
    Indian River State College
    -------------------------------------------




  • 3.  RE: Open sources of Data for Undergraduate Research Project in Statistics

    Posted 02-18-2015 10:19

    I have a few sample student projects, with corresponding data sets, on my website:

    http://web.grinnell.edu/individuals/kuipers/stat2labs/StudentProjects.html

    This site also includes a word document with several sources for free online data. I plan on updating the list this summer, but most of the links should still work.  I would also appreciate suggestions for better/more recent data sources.

    -------------------------------------------
    Shonda Kuiper
    Grinnell College
    -------------------------------------------




  • 4.  RE: Open sources of Data for Undergraduate Research Project in Statistics

    Posted 02-18-2015 10:44

    I use NLSY with my Methods II class.  They choose the variables themselves, download the data from the website, and clean it (nothing major).  It's a rich source of data with many many variables, so it suits many interests.  I used to offer also Add health public use data, but NLSY was really enough.  Occasional students have chosen to use datasets from the Department of Education, such as ECLS.

    I use YRBS, which is cross-sectional, for the Methods I (first semester) class projects.  Occasional students have chosen to use the National Survey of American Life, rich cross-sectional data about African-American and Afro-Caribbean populations, and the General Social Survey.

    Alternatively, there are sources of data in R libraries, such as the Mosaic library in R has the HELP trial, which is pretty interesting and rich enough for a project.

    Janet

    Janet Rosenbaum, Ph.D.
    Assistant Professor of Epidemiology
    School of Public Health, SUNY Downstate Medical Center, Brooklyn, NY



  • 5.  RE: Open sources of Data for Undergraduate Research Project in Statistics

    Posted 02-18-2015 11:35
    Several federal agencies produce many datasets for public use.  The Census Bureau probably produces the most data: decennial data, American Community Survey, numerous surveys.  Some of these surveys are sponsored by other agencies, so the data appear on their websites, like the Departments of Housing and Urban Development and Education.  The Bureau of Economic Analysis and Department of Labor also produce many datasets.  Your students can get good practice by combining datasets and learning about disclosure avoidance and its effects on data.

    I'm sure you can find many datasets at foreign official statistics agencies.  The OECD, UN, World Bank and IMF also produce data.

    -------------------------------------------
    Charles Coleman
    -------------------------------------------




  • 6.  RE: Open sources of Data for Undergraduate Research Project in Statistics

    Posted 02-18-2015 13:35

    Hi,

    I suggest you a few links:

    -) General Machine Learning Repository
    http://archive.ics.uci.edu/ml/datasets.html

    -) Kaggle competitions. You can freely download data. This can be motivating
    for students, even if they don't compete they know, in principle, they could.
    http://www.kaggle.com/

    -) The World Factbook is a great source of information:
    https://www.cia.gov/library/publications/the-world-factbook/
    but data are sparse. To have them in a more compact format try here:
    http://www.theodora.com/wfb/abc_world_fact_book.html


    -------------------------------------------
    Nicola Mingotti
    Universidad Carlos III
    -------------------------------------------




  • 7.  RE: Open sources of Data for Undergraduate Research Project in Statistics

    Posted 02-18-2015 15:54
    There are lots of good sources of data for undergraduate projects described in articles in the Journal of Statistics Education,
    http://www.amstat.org/publications/jse/.
    Beyond this, if your students can use R, or you can export R data to text/CSV files, you'll find some interesting topics in

    - the HistData package: Data sets from the history of statistics and data visualization, http://CRAN.R-project.org/package=HistData
    - the Lahman package: Sean Lahman's Baseball Database, http://CRAN.R-project.org/package=Lahman

    -------------------------------------------
    Michael Friendly
    Professor
    York University
    http://datavis.ca
    -------------------------------------------




  • 8.  RE: Open sources of Data for Undergraduate Research Project in Statistics

    Posted 02-19-2015 08:55
    I have used the following source from the Data and Story library which classifies data by topic, statistical method, and data subject.
    http://lib.stat.cmu.edu/DASL/
    -------------------------------------------
    Lewis Shoemaker
    Professor of Mathematics
    Millersville University of Pennsylvania
    -------------------------------------------