Statistical Consulting Section

 View Only
  • 1.  1950 CENSUS/ genealogy

    Posted 04-09-2022 01:00

    I would appreciate if a section member, possibly (encouraged but not mandatory) interested in Genealogy and interested in the recently released 1950 US CENSUS  could assist or advise me on how to download files, possibly many hundreds of Gigabytes large from the Amazon Web service (AWS). The files on the AWS are the entirety of digitized/digital version of the 1950 census. The brief description I'm looking for high resolution image files such as ".tiff". I've never accessed anything on AWS before. I'm assuming there is an R library  for that. I have no idea how large (gigabytes) the files may  be. And the AWS files are here: https://registry.opendata.aws/nara-1950-census/

    For convenience, the full text pdf of my note here

    www.barkerstats.com/PDFs/Statistics/CENSUS/1950/Note-Full-Text.pdf


    First, for anyone as yet unfamiliar with the digital /digitized 1950 census and its contents, my brief executive summary is "WOW". The scale of this release is beyond enormous, and I have run out of  other superlatives. Some administrative housekeeping issues. the CENSUS transferred all digital CENSUS materials described below to the NARA (National Archives and Records Administration). So the digital version of the census is located at NARA. The NYTIMES article below gives a sense of the extraordinary  magnitude of the project

    • Please let me give my  background/context:


    First the well written article about the 1950 CENSUS in the NYTIMES -

    https://www.nytimes.com/2022/03/31/us/census-data-1950.html


    The 1950 CENSUS fully digitized /scanned was released on April 1, 2022 (---not an April fool's joke!!!---). This was after the US government mandated 72 year waiting period. For those who have immediate elders or other extended family who were interviewed by the "field enumerators" (humans) in 1950, you should (not an absolute guarantee) be able to find their name and answers to their census questions and answers to their census questions in 1950.

    My interest in this is  family heritage and family genealogy and my efforts in this project are entirely voluntary. I have immediate and numerous other extended family  who (should have) filled out the census or the enumerator filled out the census form in 1950 in the door-to-door visits. The CENSUS enumerators no longer go door to door. the most recent census was mail in

    The metaphorical "icing on the cake" is that the scanned images include the scans of the actual handwritten completed CENSUS forms, an example of one handwritten page, purportedly containing my father's brother  (Tom/Thomas Barker). And I'll skip over a minor complication that my father's brother was Tom/Thomas, and my father's father (my grandfather) brother (my father's cousin) was also Tom. That's sorted out in a different way. The tom barker page


    http://www.barkerstats.com/PDFs/Statistics/CENSUS/images/barkerThomas.png

    The machine learning/AI "read" of that page is here:

    www.barkerstats.com/PDFs/Statistics/CENSUS/images/46-155-Tom-Barker-ED.jpg

     

    • The "purportedly"? The CENSUS and now the various genealogy services (ancestry.com and others) used or are now using their own (excerpted from other news sources) "AI and machine learning" software to 'read' the handwritten forms completed in 1950. (AI is artificial intelligence). And completely separately (as per the NYTIMES) about 400,000 (four hundred thousand) volunteers associated with the Church of the LDS (latter day saints ) are checking the AI and machine learning results. So the ".png" above is illegible to me. that ".png" and other formats are .jpg are available to the user of the main search tool.

    excerpting NYTIMES...
    "We have about 400,000 volunteers that index records all the time," said David E. Rencher, the chief genealogical officer at Family Search. "For a project like this, where we rally the community, we'll get a bump, probably several hundred thousand, just to do this."

    • and in browsing the extensive  CENSUS documents I think there are higher resolution ".tiff" files of the census forms in the AWS files. A separate side point The only technical statistical example of "reads" I'm aware of was the Several day course on Statistical Learning by Hastie and Tsibshirani from Stanford about testing classifiers on single alphabet letter images from the US post office. I am skeptical that the claimed 400,000 volunteers are looking and correcting machine learning reads created from the .png example above. This project is massive with machine learning/AI reads of several millions of handwritten forms

    Please let me note that the CENSUS materials are extensive and include the maps used to mark the Enumeration Districts.

    For example here in an area within Montgomery County Pennsylvania

    http://www.barkerstats.com/PDFs/Statistics/CENSUS/1950/image-02-m-a3378-00055-00837.jpg

    I have been collaborating with friends living in the area, and we have already discovered that some very familiar roads in the area were renamed at some time after 950


    And as to my genealogical motivation. I grew up near Philadelphia, within an hour drive of Valley Forge, Washington's crossing and a further drive to Gettysburg, Liberty Bell etc.. I have friends currently living  in Philadelphia and its suburbs who are involved/work at the local historical societies. And while I now live near Sacramento and have no relatives in the area, this was the "gold rush" and the 49'ers (gold miners) had their original camps in the area. There are several groups in Pennsylvania and similarly in Sacramento. Certainly, every state in the US has many people interested in local history and genealogy. And for this volunteer project on my part I am volunteering or may eventually volunteer to help the township, and possibly the county and state (aka; Lower Merion Historical Society, Montgomery County Historical Society , and Pennsylvania historical Society .

    To the matter at hand.
    For the purposes of discussion Pennsylvania (PA) can be described as a sort of hierarchical government structure, at the lowest level "Townships" , next level "County" then the state of PA. Caveat Emptor: I am ignoring towns, (e.g. Haverford/Villanova) villages (Blue Bell) and cities (Philadelphia) for the moment. Technically Blue bell is a "census designated place" (CDP). IN order to locate my elders using the CENSUS provided search tools I can look directly for an elder, say, my father and my father's brothers,  Thomas Barker, or Aubrey Barker or his Sister Jean Barker  who were alive in my township (and a nearby city ) in 1950. a side point I also know middle names, birth years etc. my father's sister was Jean Ingelow Barker. Alternatively I  look by the village/township/county/state - in my case  that "Gladwyne/Lower Merion/Montgomery/Pennsylvania) and I have a relative who prepared a barker family genealogy over a several year period.

    Searches

    There are at least two flavors of searches of CENSUS using search tools from the CENSUS Bureau. One is directly by name and the second is by address. And one must be aware that the census is organized in part by the Enumeration District (ED). A sort of "third search" flavor is directly by enumeration district. The ED maps are available in the census for example here

    And barely (to me) legible is the handwritten enumeration district number in the lower right 46-112

    www.barkerstats.com/PDFs/Statistics/CENSUS/1950/image-02-m-a3378-00055-00837.jpg

    I find it useful to remember that in 1950 and that era maps were hand drawn by Cartographers.

    excerpting from the NYTIMES above

    Those millions of census forms, painstakingly filled out by hand in ink, were posted online by the National Archives and Records Administration, which by law has kept them private until now. The records, searchable by name and address, offer an intimate look at a nation on the cusp of the modern era - for the merely curious, a glimpse of the life parents or grandparents led, but for historians and genealogists, a once-in-a-decade bonanza of secrets unveiled.

    "This is the Super Bowl and the Olympics combined, and it's only every 10 years - it's awesome stuff," Matt Menashes, the executive director of the National Genealogical Society, said in an interview. "What's so great about these points of data is that it helps you paint a picture - not just relationships, but what society was like."

    More housekeeping and recordkeeping on my part.

     I am keeping copious notes on my searches for relatives as I learn to navigate the census. Attached document are notes and eventually will be edited in more readable format to provide to help friends at the various historical society. And I cannot  sufficiently stress the "draft" nature of the attached.

     

    www.barkerstats.com/PDFs/Statistics/CENSUS/1950/1950-Census-Barker.pdf

    Thank you in advance



    ------------------------------
    Chris Barker, Ph.D.
    2022 Statistical Consulting Section
    Chair-elect
    Consultant and
    Adjunct Associate Professor of Biostatistics
    www.barkerstats.com


    ---
    "In composition you have all the time you want to decide what to say in 15 seconds, in improvisation you have 15 seconds."
    -Steve Lacy
    ------------------------------


  • 2.  RE: 1950 CENSUS/ genealogy

    Posted 04-11-2022 10:21

    Hi Chris,

     

    Sounds like a great project! I spoke with one of our cloud engineers here. Here is his response:

     

    Utilizing the AWS CLI would be one way to go since there is not AWS account requirement.

     

    The other tool would be to utilize aws.s3 package we utilize here at work for downloading data but that does require a User with credentials to login as.

    https://github.com/cloudyr/aws.s3

     

    Personally I would install the AWS CLI as that may be the quickest way to download the data. Either way s3 is setup to parse the data properly and download it efficiently. He just may end up using a lot of bandwidth on his own network depending on the download size of the files.

     

    https://aws.amazon.com/cli/

     

    Hope that helps.

     

    Best regards,

    Jamie

     

    Jamis Perrett, Ph.D.

    Head of Computational Life Science US &

    Interim Head of CLS-Technology Functions

     

    ////////////////////

     

    Bayer U.S. – Crop Science

    R&D Small Molecules

    700 Chesterfield Pkwy West

    Chesterfield, MO 63017 US

    Tel:       +1 636-737-6942

    Cell: +1 314-527-9403

    E-mail:    jamis.perrett@bayer.com

    Web:       http://www.bayer.com

     

    /// Follow Bayer on:

    /// Twitter /// Facebook /// Instagram /// LinkedIn /// YouTube

     

     






  • 3.  RE: 1950 CENSUS/ genealogy

    Posted 04-11-2022 13:41
      |   view attached
    Dear Jamie, Thank you very much. And in email to my ASA inbox I received a suggestion from Neil Fultz about an R package with similar capability.
    Also Cut/pasted and attached as a file
    ------------------------------------------------------------------------------------------------------------------------------------
    Neal Fultz via American Statistical Association <mail@connectedcommunity.org>
    Sat, Apr 9 at 8:23 AM
    The following message has been sent to you in response to your egroup message.

    View inbox and reply to the message online

    Message From: Neal Fultz

    Hi Chris,

    I've been doing some Genealogy research to pass the pandemic as well.

    For accessing AWS services, I'd recommend you use the `botor` package and follow the examples for accessing S3.

    If you have any questions, happy to field them.

    - Neal


    ------------------------------
    Neal Fultz


    ------------------------------
    Chris Barker, Ph.D.
    2022 Statistical Consulting Section
    Chair-elect
    Consultant and
    Adjunct Associate Professor of Biostatistics
    www.barkerstats.com


    ---
    "In composition you have all the time you want to decide what to say in 15 seconds, in improvisation you have 15 seconds."
    -Steve Lacy
    ------------------------------

    Attachment(s)

    pdf
    Neil AWS download.pdf   83 KB 1 version