Dear Colleagues,
Data Umbrella has this upcoming webinar, which is free and open to the public.
- Title: The Internet Archive for Data Scientists
- Date: Tues, Jan 20, 2026
- Time: 12pm PT / 3pm ET
- Registration options:
- Cost: Free
With a mission of "Universal Access to All Knowledge", the Internet Archive is building a digital library of Internet sites and other cultural artifacts in digital form for the last 29 years.
In this talk, core infrastructure engineer Pablo Duboue will walk us through the project and its external APIs, which he helps maintain. As a former data scientist himself, Pablo will discuss how the different APIs can be useful for data scientists.
Speaker: Pablo Duboue
Pablo Duboue is a Core Infrastructure Engineer at the Internet Archive, contributing to site reliability and maintaining the legacy codebase that powers the platform (approximately 330,000 lines of PHP). Before joining the Internet Archive, Pablo had a 25 year career in applied language technologies and natural language generation, including earning a Ph.D. in Computer Science from Columbia University and joining the IBM TJ Watson Research Centre as a Research Staff Member.
LinkedIn: https://www.linkedin.com/in/pabloduboue/
GitHub: https://github.com/DrDub
Mastodon: https://mastodon.archive.org/@drdub
This event will be recorded and placed on our YouTube. We usually have it up within 24 hours of the event. Subscribe to our YT to receive notifications: https://www.youtube.com/c/DataUmbrella/
------------------------------
Reshama Shaikh
Statistician
------------------------------