Discussion: View Thread

  • 1.  large data set

    Posted 08-15-2011 13:06
    This message has been cross posted to the following eGroups: Young Professionals Group and Statistical Consulting Section .
    -------------------------------------------
    I am trying to find a large data set. I wish census.gov had a "try to download the whole thing"  button.  
    I have found an astronomical set that has 170,000 lines, but I am looking for something one million+ for experimental purposes. Does anyone know where I can find one? 

    -------------------------------------------
    Daniel Livingston
    -------------------------------------------


  • 2.  RE:large data set

    Posted 08-15-2011 13:17

    Have fun!

    http://aws.amazon.com/datasets

    -------------------------------------------
    Charles Mann
    Charles R Mann Associates Inc
    -------------------------------------------








  • 3.  RE:large data set

    Posted 08-15-2011 13:18

    Try --

    http://www.icpsr.umich.edu/icpsrweb/SAMHDA/

    They have multi-year combined datasets.  For example, the TEDS data (treatment episodes) combined data will have several million records.

    Steve
    -------------------------------------------
    Stephan Arndt
    Professor
    University of Iowa, Iowa Consortium
    -------------------------------------------








  • 4.  RE:large data set

    Posted 08-15-2011 13:33
    For some meanings of "experimental purposes" it is often useful to have data from a pseudo pop with "known" distributions.  It is even possible to generate sets of variables drawn from pseudo pops with "known"correlations, ordinal and nominal variables with "known" pop distributions.

    This  is a simple example using SPSS syntax which will generate 2,500,000 cases with 100 pseudo-normally distributed variables with a from a "known" pop with a mean of 50 and an sd of 10.
    By nesting loops and using many different rv.* functions you can create all kinds of files.


    new file.
    input program.
    vector x (100f3).
    loop id = 1 to 2500000.
    loop #p = 1 to 10o.
    compute x(#p) = rnd(rv.normal(50,10)).
    end loop.
    end case.
    end loop.
    end file.
    end input program.

    -------------------------------------------
    Arthur Kendall
    Social Research Consultants
    -------------------------------------------








  • 5.  RE:large data set

    Posted 08-15-2011 13:39
    Hi,

    Go to http://www.kdnuggets.com/datasets/

    -------------------------------------------
    Patrick Spagon
    -------------------------------------------








  • 6.  RE:large data set

    Posted 08-15-2011 22:03
    #1 I have a 78 gig data set we provide to marketing and economics academics in conjunction with INFORMS.

    It's described in Marketing Science here:

    Bronnenberg, Bart J., Michael W. Kruger, Carl F. Mela. 2008. Database paper:
    The IRI marketing data set. Marketing Science, 27(4) 745-748.

    Roughly speaking, this is 7 years of store/week/UPC data for over 1500 stores for 30 product categories.

    If you are interested in more information about this data set, please email me at work
    mike.kruger@infores.com
    and I will send you (or anyone else) more information.


    #2 If you're just looking for lots of records, we purchased a block (not block group) level file from a secondary census supplier on DVD. I can look up the source; I remember it as inexpensive, but this was relative to a multimillion dollar development project so it might not be inexpensive relative to your expectations. There are about 8 million blocks in the US.  I think there's a site where you can download one state's blocks at a time for free; we lacked that sort of patience.

    #3 In terms of free, the Dominicks POS data is pretty large, although I've not worked with it.  The link is here:

    The Dominicks Finer Foods data can be found on the Kilts Center for Marketing web site,
    http://research.chicagogsb.edu/marketing/databases/dominicks/index.aspx

    -------------------------------------------
    Michael Kruger
    Information Resources Inc
    -------------------------------------------