2020 Conference: Recent Advances in Natural Language Processing
Statistician of the Year Award: Trevor Hastie
The Chicago Chapter of the American Statistical Association is extremely excited to announce our 2020 Conference and Statistician of the Year (SOY) award. Historically, both of these are in person, but given the current pandemic, we are shifting this event to a virtual format.
November 20th, 2020
8:30 am - 4:30 PM Central Time (exact times will be updated)
We will be holding the conference remote via Zoom due to the global pandemic.
The agenda will continue to evolve! Check here for updates.
New talks added and new start time!
8:30 - 8:45 AM: Opening comments and logistics for the day
8:45 - 9:45 AM: NLP 2020: What Works and What's Next
The Natural Language Processing state of the art is advancing rapidly, powered by deep learning and an accelerating drive to build machines that understand, and are understood by, humans. This talk will review long-established rules-based and statistical methods and introduce distributional models, vector-spaces embeddings, representations, and transformers. We will discuss general natural language understanding and special tasks such as question-answering, conversational systems, speech analysis, natural language generation, and emotion AI. Naturally you'll want to know how, so we'll survey leading open-source and commercial options, and we'll talk about data, bias, and ethics in NLP, in sum, providing a comprehensive look at what works in NLP and also at what's next, what to expect in the year to come.
Seth Grimes consults on product and market strategy for natural language processing (NLP), text analytics, emotion AI, and digital transformation via Alta Plana Corporation (http://altaplana.com). He runs the NY-NLP and DC-NLP meetups and organizes the Emotion AI and CX Emotion conferences.
Follow him on Twitter at @SethGrimes.
9:45 - 10:45 AM: Computational Stylistics and Beyond
How (if at all) can we use computational natural language processing to understand something about a person’s identity by how they use language? Answers to this question have applications in criminal and civil investigations, counterterrorism, cybersecurity, social science, marketing, and more. In this talk I will describe work over the last decade or so on using computational text classification over collections of texts (corpora) to get at this question. The two primary methodological issues are the choice of language features by which to most effectively represent texts for analysis, and how to set up a properly controlled experiment, which has many pitfalls. I will discuss several experimental case studies including for identifying author demographics and personality.
Shlomo Engelson Argamon is Professor and Chair of the Computer Science Department and the Director of the Master of Data Science Program at Illinois Institute of Technology. He researches computational methods for style-based analysis of natural language using machine learning and lexical semantic representations, exploring applications in intelligence analysis, forensic linguistics, biomedical informatics, and humanities scholarship. His current work includes cognitive metaphor analysis, extracting structured representations of opinions from text, authorship profiling of anonymous texts, and evidence-based analysis of the medical literature, all from analyzing large amounts of textual data. He is particularly interested in elucidating the relationships that can be divined from corpora among language use, individual reasoning, and social context.
Argamon received his B.Sc. in Applied Mathematics from Carnegie-Mellon University (1988) and his Ph.D. in Computer Science from Yale University (1994), where he was a Fannie and John Hertz Foundation Doctoral Research Fellow (1991-94). He was also a Fulbright Postdoctoral Fellow at Bar-Ilan University (1994-96). Argamon was the 2014 Distinguished Visitor in Forensic Linguistics at the Centre for Forensic Linguistics of Aston University, Birmingham, UK, and is a Fellow of the British Computer Society. He is the editor of The Structure of Style (Springer 2009) and Computational Methods in Counterterrorism (Springer 2010).
10:45 - 11:00 AM: First break
11:00 - 12:00 PM: The Doctor (or Chatbot?) Is In: Towards Automated Monitoring of Cognitive Health Status and Medical Self-Disclosure
Natural language processing is a powerful tool that opens a wide range of opportunities in many domains, including the healthcare sector. In this talk I’ll introduce two exciting recent healthcare tasks: predicting cognitive health status and detecting medical self-disclosure. My colleagues and I explore the former at both a coarse-grained level, classifying individuals into dementia and control groups, and a fine-grained level, predicting cognitive health scores along a continuum. To launch our investigation of medical self-disclosure, we develop a large dataset from publicly available posts in online health communities, and train a predictive model that establishes a strong performance benchmark for the task. Finally, I’ll conclude by introducing some intriguing directions for future work in both tasks.
Natalie Parde is an Assistant Professor in the Department of Computer Science at the University of Illinois at Chicago, where she also co-directs UIC’s Natural Language Processing Laboratory. Her research interests are in natural language processing, with emphases in healthcare applications, interactive systems, multimodality, and creative language. She serves on the program committees of the Conference on Empirical Methods in Natural Language Processing (EMNLP), the Association for Computational Linguistics (ACL), and the North American Chapter of the ACL (NAACL), among other conferences and workshops. In her spare time, Dr. Parde enjoys engaging in mentorship and outreach for underrepresented CS students.
12:10 - 1:00 PM: Networking Lunch
1:00 - 2:00 PM: Making The Band: Leveraging Generative Image and Language Networks toward Creative Ends
The reality show "Making the Band" making its return to the airwaves this year begs the question: What does it take to make a band? In this talk, data scientists Ryan Cranfill and Chris Kucharczyk describe the ways in which they leveraged NVIDIA's StyleGAN and OpenAI's GPT networks to bring a list of fake band names to life, as well as how they designed an interactive installation that allowed users to create their own fake astronaut profiles.
Ryan is a Senior Data Scientist at IDEO Chicago, where he weaves data and software expertise together to design impactful solutions for clients.
At IDEO, Ryan's work has included using agent-based simulation on real-world roadway data to push the design of next-generation of delivery vehicles and services, creating a novel method of running numerical optimizations at scale using serverless computing, and crafting artware that uses touch and musical input to generate album art via a neural network.
Chris is a Senior Data Scientist at IDEO Chicago, where he is passionate about exploring how people interact with data and designing tools that facilitate those interactions.
Prior to joining IDEO, Chris received his Ph.D. in Materials Science from the California Institute of Technology where he developed experimental and analytical techniques that enabled high-throughput exploration of solid oxide fuel cell materials. His thesis work required visualizing multi-dimensional datasets and inspired his interest in data science. He first got his start in data science as an intern at Datascope, a data science consulting firm acquired by IDEO in 2017.
2:00 - 3:00 PM: Scaling data-driven decision making with automated data storytelling
BioAs Chief Scientist, Nate is responsible for defining Narrative Science’s vision for AI and automated data storytelling, ensuring that our products are aligned to that vision, and articulating that vision outside of the company. Over his 10+ years at Narrative Science, Nate and NS co-inventors have been granted 15 patents, with another 20 filed. Prior to NS, Nate earned his Ph.D. at Northwestern University in Artificial Intelligence with his thesis, “Machine-generated Content.”
Today, anywhere from 25%-40% of a company is confident they are making decisions based on data. However, questions remain about the best way to equip the other 60%-75%. Luckily, recent advances in AI & Data Storytelling technology are unlocking the ability for computers to communicate more like us using language and storytelling, but there are implications which will lead all of us to rethink the role of communication and transparency in our day-to-day work. This talk will highlight the role and value of data storytelling, along with a technical deep dive into an automated data storytelling product we’ve built at Narrative Science. I’ll also cover the pros and cons of our structured approach to language generation vs. deep learning models like GPT-3.
3:00 - 4:00 PM: Awarding of the Statistician of the Year Award to Trevor Hastie with acceptance speech by Trevor Hastie
Trevor Hastie was born in South Africa in 1953. He received his university education from Rhodes University, South Africa (BS), University of Cape Town (MS), and Stanford University (Ph.D Statistics 1984).
His first employment was with the South African Medical Research Council in 1977, during which time he earned his MS from UCT. In 1979 he spent a year interning at the London School of Hygiene and Tropical Medicine, the Johnson Space Center in Houston Texas, and the Biomath department at Oxford University. He joined the Ph.D program at Stanford University in 1980. After graduating from Stanford in 1984, he returned to South Africa for a year with his earlier employer SA Medical Research Council. He returned to the USA in March 1986 and joined the statistics and data analysis research group at what was then AT&T Bell Laboratories in Murray Hill, New Jersey. After eight years at Bell Labs, he returned to Stanford University in 1994 as Professor in Statistics and Biostatistics. In 2013 he was named the John A. Overdeck Professor of Mathematical Sciences, and in 2018 was elected to the National Academy of Sciences.
His main research contributions have been in applied statistics; he has published over 200 articles and written five books in this area: "Generalized Additive Models" (with R. Tibshirani, Chapman and Hall, 1991), "Elements of Statistical Learning" (with R. Tibshirani and J. Friedman, Springer 2001; second edition 2009), "An Introduction to Statistical Learning, with Applications in R" (with G. James, D. Witten and R. Tibshirani, Springer 2013) and "Statistical Learning with Sparsity" (with R. Tibshirani and M. Wainwright, Chapman and Hall, 2015) and "Computer Age Statistical Inference" (with Bradley Efron, Cambridge 2016). He has also made contributions in statistical computing, co-editing (with J. Chambers) a large software library on modeling tools in the S language ("Statistical Models in S", Wadsworth, 1992), which form the foundation for much of the statistical modeling in R. His current research focuses on applied statistical modeling and prediction problems in biology and genomics, medicine and industry.
4:00 - 4:30 PM: Virtual Happy Hour
The Chicago Chapter of the American Statistical Association is a non-profit. All proceeds from this conference help fund future events and statistical education in the Chicago area.
Special early bird pricing before November 1, 2020!
Hardship Exemption: $0. We do not want cost to be a barrier for anyone to attend this conference. If for any reason, the cost is a barrier to you, please feel free to select this option; no questions asked!
Active CCASA Members: $50 ($35 early bird)
Non-Member: $85 ($60 early bird)
Students: $25 ($10 early bird)
Corporate Sponsorship: $500 ($400 early bird). Benefits of sponsorship include your name being attached to the event, a Zoom breakout room where guests can "drop in" and 2 free registrations to the event.