Dear Colleagues,
The ASA Statistical Learning and Data Science Section is pleased to announce December webinar featuring Professor Weijie Su from University of Pennsylvania. Prof. Su will discuss about ensuring fairness and combating misinformation in large language models (LLMs). Hope to see you there!
Title: How Statistics Can Advance Large Language Models: Fairness Alignment and Watermarking
Speakers: Prof. Weijie Su, Wharton Statistics and Data Science Department, University of Pennsylvania
Date and Time: December 3, 2024, 1:00 to 2:30 pm Eastern Time
Registration Link: ASA SLDS Webinar Registration Link [eventbrite.com]
Abstract: Large language models (LLMs) have rapidly emerged as a transformative innovation in machine learning. However, their increasing influence on human decision-making processes raises critical societal questions. In this talk, we will demonstrate how statistics can help address two key challenges: ensuring fairness for minority groups through alignment and combating misinformation through watermarking. First, we tackle the challenge of creating fair LLMs that equitably represent and serve diverse populations. We derive a regularization term that is both necessary and sufficient for aligning LLMs with human preferences, ensuring equitable outcomes across different demographics. Second, we introduce a general statistical framework to analyze the efficiency of watermarking schemes for LLMs. We develop optimal detection rules for an important watermarking scheme recently developed at OpenAI and empirically demonstrate its superiority over the existing detection method. Throughout the talk, we will showcase how statistical insights can not only address pressing challenges posed by LLMs but also unlock substantial opportunities for the field of statistics to drive responsible generative AI development. This talk is based on arXiv:2405.16455 and arXiv:2404.01245.
Presenter: Weijie Su is an Associate Professor in the Wharton Statistics and Data Science Department and, by courtesy, in the Departments of Computer and Information Science and Mathematics at the University of Pennsylvania. He is a co-director of Penn Research in Machine Learning (PRiML) Center. Prior to joining Penn, he received his Ph.D. in Statistics from Stanford University in 2016 and bachelor's degree in Mathematics from Peking University in 2011. His research interests span the statistical foundations of generative AI, privacy-preserving machine learning, high-dimensional statistics, and optimization. He serves as an associate editor of the Journal of Machine Learning Research, Journal of the American Statistical Association, Foundations and Trends in Statistics, and Operations Research, and he is currently guest editing a special issue on Statistics for Large Language Models and Large Language Models for Statistics in Stat. His work has been recognized with several awards, such as the Stanford Anderson Dissertation Award, NSF CAREER Award, Sloan Research Fellowship, IMS Peter Hall Prize, SIAM Early Career Prize in Data Science, ASA Noether Early Career Award, and the ICBS Frontiers of Science Award in Mathematics.
------------------------------
Zhihua Su, PhD
Associate Professor
Department of Statistics
University of Florida
zhihuasu@stat.ufl.edu------------------------------