I would appreciate if a section member, possibly (encouraged but not mandatory) interested in Genealogy and interested in the recently released 1950 US CENSUS could assist or advise me on how to download files, possibly many hundreds of Gigabytes large from the Amazon Web service (AWS). The files on the AWS are the entirety of digitized/digital version of the 1950 census. The brief description I'm looking for high resolution image files such as ".tiff". I've never accessed anything on AWS before. I'm assuming there is an R library for that. I have no idea how large (gigabytes) the files may be. And the AWS files are here: https://registry.opendata.aws/nara-1950-census/
For convenience, the full text pdf of my note here
www.barkerstats.com/PDFs/Statistics/CENSUS/1950/Note-Full-Text.pdf
First, for anyone as yet unfamiliar with the digital /digitized 1950 census and its contents, my brief executive summary is "WOW". The scale of this release is beyond enormous, and I have run out of other superlatives. Some administrative housekeeping issues. the CENSUS transferred all digital CENSUS materials described below to the NARA (National Archives and Records Administration). So the digital version of the census is located at NARA. The NYTIMES article below gives a sense of the extraordinary magnitude of the project
- Please let me give my background/context:
First the well written article about the 1950 CENSUS in the NYTIMES -
https://www.nytimes.com/2022/03/31/us/census-data-1950.html
The 1950 CENSUS fully digitized /scanned was released on April 1, 2022 (---not an April fool's joke!!!---). This was after the US government mandated 72 year waiting period. For those who have immediate elders or other extended family who were interviewed by the "field enumerators" (humans) in 1950, you should (not an absolute guarantee) be able to find their name and answers to their census questions and answers to their census questions in 1950.
My interest in this is family heritage and family genealogy and my efforts in this project are entirely voluntary. I have immediate and numerous other extended family who (should have) filled out the census or the enumerator filled out the census form in 1950 in the door-to-door visits. The CENSUS enumerators no longer go door to door. the most recent census was mail in
The metaphorical "icing on the cake" is that the scanned images include the scans of the actual handwritten completed CENSUS forms, an example of one handwritten page, purportedly containing my father's brother (Tom/Thomas Barker). And I'll skip over a minor complication that my father's brother was Tom/Thomas, and my father's father (my grandfather) brother (my father's cousin) was also Tom. That's sorted out in a different way. The tom barker page
http://www.barkerstats.com/PDFs/Statistics/CENSUS/images/barkerThomas.png
The machine learning/AI "read" of that page is here:
www.barkerstats.com/PDFs/Statistics/CENSUS/images/46-155-Tom-Barker-ED.jpg
- The "purportedly"? The CENSUS and now the various genealogy services (ancestry.com and others) used or are now using their own (excerpted from other news sources) "AI and machine learning" software to 'read' the handwritten forms completed in 1950. (AI is artificial intelligence). And completely separately (as per the NYTIMES) about 400,000 (four hundred thousand) volunteers associated with the Church of the LDS (latter day saints ) are checking the AI and machine learning results. So the ".png" above is illegible to me. that ".png" and other formats are .jpg are available to the user of the main search tool.
excerpting NYTIMES...
"We have about 400,000 volunteers that index records all the time," said David E. Rencher, the chief genealogical officer at Family Search. "For a project like this, where we rally the community, we'll get a bump, probably several hundred thousand, just to do this."
- and in browsing the extensive CENSUS documents I think there are higher resolution ".tiff" files of the census forms in the AWS files. A separate side point The only technical statistical example of "reads" I'm aware of was the Several day course on Statistical Learning by Hastie and Tsibshirani from Stanford about testing classifiers on single alphabet letter images from the US post office. I am skeptical that the claimed 400,000 volunteers are looking and correcting machine learning reads created from the .png example above. This project is massive with machine learning/AI reads of several millions of handwritten forms
Please let me note that the CENSUS materials are extensive and include the maps used to mark the Enumeration Districts.
For example here in an area within Montgomery County Pennsylvania
http://www.barkerstats.com/PDFs/Statistics/CENSUS/1950/image-02-m-a3378-00055-00837.jpg
I have been collaborating with friends living in the area, and we have already discovered that some very familiar roads in the area were renamed at some time after 950
And as to my genealogical motivation. I grew up near Philadelphia, within an hour drive of Valley Forge, Washington's crossing and a further drive to Gettysburg, Liberty Bell etc.. I have friends currently living in Philadelphia and its suburbs who are involved/work at the local historical societies. And while I now live near Sacramento and have no relatives in the area, this was the "gold rush" and the 49'ers (gold miners) had their original camps in the area. There are several groups in Pennsylvania and similarly in Sacramento. Certainly, every state in the US has many people interested in local history and genealogy. And for this volunteer project on my part I am volunteering or may eventually volunteer to help the township, and possibly the county and state (aka; Lower Merion Historical Society, Montgomery County Historical Society , and Pennsylvania historical Society .
To the matter at hand.
For the purposes of discussion Pennsylvania (PA) can be described as a sort of hierarchical government structure, at the lowest level "Townships" , next level "County" then the state of PA. Caveat Emptor: I am ignoring towns, (e.g. Haverford/Villanova) villages (Blue Bell) and cities (Philadelphia) for the moment. Technically Blue bell is a "census designated place" (CDP). IN order to locate my elders using the CENSUS provided search tools I can look directly for an elder, say, my father and my father's brothers, Thomas Barker, or Aubrey Barker or his Sister Jean Barker who were alive in my township (and a nearby city ) in 1950. a side point I also know middle names, birth years etc. my father's sister was Jean Ingelow Barker. Alternatively I look by the village/township/county/state - in my case that "Gladwyne/Lower Merion/Montgomery/Pennsylvania) and I have a relative who prepared a barker family genealogy over a several year period.
Searches
There are at least two flavors of searches of CENSUS using search tools from the CENSUS Bureau. One is directly by name and the second is by address. And one must be aware that the census is organized in part by the Enumeration District (ED). A sort of "third search" flavor is directly by enumeration district. The ED maps are available in the census for example here
And barely (to me) legible is the handwritten enumeration district number in the lower right 46-112
www.barkerstats.com/PDFs/Statistics/CENSUS/1950/image-02-m-a3378-00055-00837.jpg
I find it useful to remember that in 1950 and that era maps were hand drawn by Cartographers.
excerpting from the NYTIMES above
Those millions of census forms, painstakingly filled out by hand in ink, were posted online by the National Archives and Records Administration, which by law has kept them private until now. The records, searchable by name and address, offer an intimate look at a nation on the cusp of the modern era - for the merely curious, a glimpse of the life parents or grandparents led, but for historians and genealogists, a once-in-a-decade bonanza of secrets unveiled.
"This is the Super Bowl and the Olympics combined, and it's only every 10 years - it's awesome stuff," Matt Menashes, the executive director of the National Genealogical Society, said in an interview. "What's so great about these points of data is that it helps you paint a picture - not just relationships, but what society was like."
More housekeeping and recordkeeping on my part.
I am keeping copious notes on my searches for relatives as I learn to navigate the census. Attached document are notes and eventually will be edited in more readable format to provide to help friends at the various historical society. And I cannot sufficiently stress the "draft" nature of the attached.
www.barkerstats.com/PDFs/Statistics/CENSUS/1950/1950-Census-Barker.pdf
Thank you in advance
------------------------------
Chris Barker, Ph.D.
2022 Statistical Consulting Section
Chair-elect
Consultant and
Adjunct Associate Professor of Biostatistics
www.barkerstats.com---
"In composition you have all the time you want to decide what to say in 15 seconds, in improvisation you have 15 seconds."
-Steve Lacy
------------------------------