ASA Community

2023 Distinguished Achievement Award and Lectureship Winner

Bin Yu
University of California, Berkeley

The 2023 Committee of Presidents of Statistical Societies (COPSS) Distinguished Achievement Award and Lectureship Committee selected Bin Yu, University of California, Berkeley to deliver the COPSS Lecture at the Joint Statistical Meetings in 2023. The citation for Dr. Yu's plaque reads:

"For fundamental contributions to information theory; statistical and machine learning methodology; interdisciplinary research in fields such as genomics, neuroscience, remote sensing, and document summarization; and for outstanding dedication to professional service, leadership, and mentoring of students and young scholars."

Dr. Yu's talk is titled "Veridical Data Sciences towards Trustworthy AI."

Abstract

"AI is like nuclear energy–both promising and dangerous." Bill Gates, 2019

Data Science is at the heart of today's AI and has driven most of recent advances in biomedicine and beyond. Human judgment calls are ubiquitous at every step of a data science life cycle (DSLC): problem formulation, data cleaning, EDA, modeling, and reporting. Such judgment calls are often responsible for the "dangers" of AI by creating a universe of hidden uncertainties well beyond traditional sample-to-sample uncertainty.

To mitigate these dangers, veridical (truthful) data science is introduced based on three principles: Predictability, Computability and Stability (PCS). The PCS framework unifies, streamlines, and expands on the ideas and best practices of statistics and machine learning. In every step of a DSLC, PCS emphasizes reality check through predictability, considers computability up front, and takes into account of uncertainty sources including those from data curation/cleaning and algorithm choice to build trust in data results. The PCS framework will be showcased through collaborative research in finding genetic drivers of a heart disease, stress-testing a clinical decision rule, and identifying microbiome-related metabolite signature for possible early cancer detection. PCS is supported by a python software package v-flow and a documentation template (https://binyu.stat.berkeley.edu).

Biography of Dr. Yu

Yu’s current research focuses on practice, algorithm, and theory of statistical machine learning, interpretable machine learning, and causal inference. Her group is engaged in interdisciplinary research with scientists from genomics, neuroscience, and precision medicine. She and her group have developed the PCS framework for veridical data science towards responsible, reliable, and transparent data analysis and decision-making. PCS stands for predictability, computability and stability. It unifies, streamlines, and expands on ideas and best practices of machine learning and statistics to uncover and address a hidden universe of uncertainties well beyond sample-sample uncertainty in a data science life cycle.

In the past, she jointly developed a highly cited spatially adaptive wavelet image denoising method and a low-complexity low-delay perceptually lossless audio coder that was incorporated in Bose wireless speakers. She also co-developed a fast and well-validated Arctic cloud detection algorithm. Her collaborative paper in 2011 with the Gallant Lab at Berkeley on movie reconstruction from fMRI brain signals received extensive and intensive coverages by numerous media outlets, including The Economist, Forbes, Der Spiegel, Daily Mail, New Scientist and Massachusetts Institute of Technology (MIT) Technology Review. This work was named one of the best 50 inventions in 2011 by the Time Magazine. She and collaborators mapped a cell’s destiny in Drosophila via stability-driven NMF, and used the PCS framework to stress-test or internally validate clinical decision rules used in the ER. Previously, she pioneered Vapnik-Chervonenkis (VC) type theory needed for asymptotic analysis of time series and spatio-temporal processes. She made fundamental contributions to information theory and statistics through work on minimum description length (MDL) and entropy estimation. Recently, she and her collaborators developed iterative random forests (iRF), X-learner for heterogeneous treatment effect estimation in causal inference, hierarchical shrinkage (HS) decision trees, and Fast and interpretable greedy trees (FIGS).

She is the Class of 1936 Second Chair in the College of Letters and Science and Chancellor's Distinguished Professor, Departments of Statistics and Electrical Engineering & Computer Sciences, and Center for Computational Biology at the University of California at Berkeley. She obtained her BS Degree in Mathematics from Peking University and her MS and Ph.D. Degrees in Statistics from UC Berkeley. She was an Assistant Professor at University of Wisconsin -Madison, Visiting Assistant Professor at Yale University, a Member of the Technical Staff at Lucent Bell-Labs, and a Miller Research Professor at Berkeley. She was a Visiting Faculty at MIT, ETH, Poincare Institute, Peking University, INRIA-Paris, Fields Institute at the University of Toronto, Newton Institute at Cambridge University, and the Flatiron Institute in New York City. She has also served as Chair of the Department of Statistics at UC Berkeley and had a crucial role in envisioning the intellectual and organizational vision for the Division of Computing, Data Science, and Society (CDSS) at UC Berkeley as a faculty advisory committee member.

Yu is a Member of the U.S. National Academy of Sciences and the American Academy of Arts and Sciences. She was President of the Institute of Mathematical Statistics (IMS) in 2013-2014, Guggenheim Fellow, Tukey Memorial Lecturer of the Bernoulli Society, and Rietz Lecturer of IMS. In 2018, Yu was awarded the Elizabeth L. Scott Award by COPSS for principled leadership in the international scientific community; for commitment and actions towards diversity, equity, and inclusion; for consistently mentoring and encouraging women students and new researchers in statistics and data science; and for scientific contributions to statistical and machine learning methodology at the highest scholarly level. She holds an Honorary Doctorate from The University of Lausanne, and served on the inaugural scientific advisory board of the UK Turing Institute of Data Science and AI. She is serving on the editorial board of PNAS and as a senior advisor at Simons Institute for the Theory of Computing at Berkeley. She will give the Wald Memorial Lectures of IMS at JSM in 2023.