Dear Colleagues,
The section of Statistical Learning and Data Science is happy to present its next webinar on LLM training and coding. It will focus on practical aspects on how to build and refine LLM. Drs. Jiancong Xiao and Xiang Li will discuss about key steps along these lines. Hope to see you there!
Title: LLM Training and Coding for Statistical Learning and Data Science
Speakers: Drs. Jiancong Xiao and Xiang Li, Postdoctoral Researchers at the University of Pennsylvania
Date and Time: November 5, 2025, 2:00 to 3:30 pm Eastern Time
Abstract: Large Language Models (LLMs) have become important tools across research and industry, yet effectively working with them requires both conceptual understanding and practical experience. This talk provides an accessible introduction to key steps for developing and using LLMs. The first part will focus on alignment and fine-tuning, introducing techniques such as Reinforcement Learning from Human Feedback (RLHF) and Direct Preference Optimization (DPO), and discussing practical strategies for fine-tuning personalized models. The second part will turn to token generation and model interaction, demonstrating how to set up LLM interfaces, control sampling procedures, and inject signals to influence token predictions. This section will also touch on applications such as watermarking, model evaluation, and understanding how randomness affects output diversity. Together, the two parts aim to help participants, especially those new to LLMs, gain foundational insight into how these models are built, refined, and studied in modern research of statistical learning and data science.
Presenters:
Jiancong Xiao is a postdoctoral researcher at the University of Pennsylvania, working with Professors Qi Long and Weijie Su. He received his Ph.D. from the Chinese University of Hong Kong, Shenzhen, an M.S. from the Chinese University of Hong Kong, and a B.S. from Sun Yat-sen University. His research interests lie in statistical and deep learning theory, with a focus on developing responsible and trustworthy machine learning models. His recent work explores statistical foundations of large language models. His research has been featured at top journals and machine learning conferences, including JASA, NeurIPS, COLT, ICML, and ICLR.
Xiang Li is a postdoctoral researcher at the University of Pennsylvania, collaborating with Prof. Qi Long and Prof. Weijie Su. He received his Ph.D. in 2023 and B.S. in 2018 from the School of Mathematical Sciences at Peking University. His research lies at the intersection of statistics, stochastic optimization, and machine learning, with a recent focus on large language models. During his Ph.D., he made significant contributions to federated learning, stochastic approximation, online decision-making, and online statistical inference. His work has been featured at leading machine learning conferences, including ICML, ICLR, and NeurIPS, as well as in top journals such as JMLR and AOS.
-------------------------------------------
------------------------------
Zhihua Su, PhD
Quantitative Researcher
nVerses Capital, LLC
12783 Forest Hill Blvd,
Wellington, FL, 33411
------------------------------