SDNS Webinar Series

Do you have a suggestion for future webinar topics or speakers? Would you like to speak on a topic? Please complete our short form here.

Subscribe to our SDNS Webinar Series YouTube channel to receive notifications when videos are posted!

Dr. Annie Sauer Booth, Department of Statistics, NC State University
Deep Gaussian Process Surrogates for Computer Experiments
March 19, 2024

Link to recording on the SDNS YouTube Channel: https://youtu.be/DMEWDMhIXCI

Abstract:

This talk provides an overview of Bayesian deep Gaussian processes (DGPs) as surrogate models for computer experiments. Computer experiments are invaluable tools for replacing and/or supplementing direct experimentation, particularly in settings where physical experimentation is restricted by ethical, time, financial, or practicality constraints. Such simulations are necessarily complex and require statistical “surrogate” models, trained on a limited budget of simulator evaluations, which can provide predictions and uncertainty quantification at untried input configurations. Gaussian process (GP) surrogates are the canonical choice, but they are limited by stationarity constraints. DGPs upgrade ordinary GPs through functional composition, in which intermediate GP layers warp the original inputs, providing flexibility to model non-stationary dynamics. In large data settings, we integrate Vecchia approximation for faster computation. In small data settings, we utilize strategic active learning/sequential designs with a variety of objectives including variance reduction, Bayesian optimization, and reliability analysis. We showcase implementation in the “deepgp” package for R on CRAN.

Dr. J. Derek Tucker, Sandia National Laboratories
Elastic Bayesian Model Calibration
December 4, 2023

Link to recording on the SDNS YouTube Channel: https://youtu.be/UNyOfL_zLX8

Abstract: Functional data are ubiquitous in scientific modeling. For instance, quantities of interest are modeled as functions of time, space, energy, density, etc. Uncertainty quantification methods for computer models with functional response have resulted in tools for emulation, sensitivity analysis, and calibration that are widely used. However, many of these tools underperform when the model's parameters control not only the amplitude variation of the functional output, but also its alignment (or phase variation). We present a simple framework for Bayesian model calibration when the model responses are misaligned functional data. We demonstrate the techniques to emulate and calibrate a hydrodynamics model using data from a collection of flyer plate experiments and also from a physical model that correspond to tantalum experiments around Sandia's Z-machine.

Dr. Thomas Mathew, Department of Mathematics & Statistics, UMBC
Tolerance Intervals and Regions: An introduction and Some Applications
July 21, 2023

Link to recording on the SDNS YouTube Channel: https://youtu.be/FiYdAF-6GHU?si=9U9FiWKX2IiMtBvi

Abstract: A tolerance interval is an interval that is expected to capture a specified proportion or more of a population with a given confidence level. The interval is constructed using a random sample, and the confidence level refers to the sampling variability. A tolerance region is similarly defined for a multivariate population. In the talk, tolerance intervals and regions will be formally defined, their computation will be briefly explained for some univariate and multivariate populations, and illustrated using several real applications. The applications include testing the ballistic resistance of personal body armor, the assessment of whether the distribution of the peak cladding temperature (PCT) of a nuclear power plant is below a regulatory requirement, and the computation of reference intervals and regions in laboratory medicine.

Dr. Chris Gotwalt, Chief Data Scientist, JMP
Modeling Spectral Data using JMP Pro 17
January 25, 2023

Link to recording on the SDNS YouTube Channel: https://youtu.be/SvJ6m95Br9o

Abstract: Curves and spectra are fundamental to understanding many scientific and engineering applications. As a result, curves or spectral data are created by many types of test and manufacturing equipment. When these data are used as part of a designed experiment or a machine learning application, most software requires the practitioner to extract features from the data prior to modeling. This leads to models that are more difficult to interpret and are less accurate than models that treat spectral/curve data as first-class citizens.

Chris Gotwalt, JMP Chief Data Scientist, will present an overview of functional data analysis in JMP Pro. His talk will focus on new capabilities in JMP Pro 17, such was wavelet analysis, designed to help statisticians of all levels analyze spectral data from NMR, mass spectroscopy, chromatography, and many other types of analysis common in the chemical, pharmaceutical, and biotech industries. He will explain how and why JMP Pro handles these data. The session also includes time for Q&A.

Dr. Amanda Muyskens, Statistician, Computational Engineering Division at Lawrence Livermore National Laboratory
MuyGPs: Scalable Gaussian Process Model Estimation with Uncertainty Quantification
September 21, 2022

Link to recording on the SDNS YouTube Channel: https://youtu.be/P8KioHUgVI8
Python package MuyGPyS: https://github.com/LLNL/MuyGPyS

Abstract: The utilization of large and complex data by machine learning in support of decision-making is of increasing importance in many scientific and national security domains. However, the need for uncertainty estimates or similar confidence indicators inhibits the integration of many popular machine learning pipelines, such as those that rely upon deep learning. In contrast Gaussian Process (GP) models are popular for their principled uncertainty quantification but require quadratic memory to store the covariance matrix and cubic computation to perform inference or evaluate the likelihood function. We present MuyGPs, a novel computationally efficient GP hyperparameter estimation method for large data that has recently been released for open-source use in the python package MuyGPyS. MuyGPs builds upon prior methods that take advantage of nearest neighbors structure for sparsification and uses leave-one-out cross-validation to optimize covariance (kernel) hyperparameters without realizing the expensive multivariate normal likelihood. We describe our model and approximate methods and compare our implementations against the state-of-the-art competitors in approximate GP regression in space-based applications. Finally, we discuss recent and future advances in MuyGPs including HPC integration and non-stationary models.

This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344. IM release number: LLNL-ABS-832607.

Dr. Christopher Franck, Department of Statistics, Virginia Tech
An Introduction to Model Uncertainty and Averaging for Categorical Data Analysis
June 21, 2022

Link to recording: https://www.youtube.com/watch?v=DsVthmz6tmY
Presentation materials can be found here

Abstract: Categorical data analysis is ubiquitous in the 21st century, and its analysis is vital to advance research in many domains. In an era with ever-expanding availability of data, the choice of which statistical model should be used is as important as ever. While statistical techniques to choose among competing models have been commonplace for a while, it seems that accessible techniques to effectively combine inferences over competitive models are not as widely used in practice. The purpose of this short course is to describe techniques that enable researchers to simultaneously leverage a variety of candidate models to improve prediction and inference. We describe an easy-to-use technique (based on the Bayesian information Criterion) to conduct approximate Bayesian model averaging, which weighs inferences proportionally to each candidate models’ posterior probability and can provide improved out-of-sample predictive performance over an individual model. We also discuss stacking, which combines model predictions according to each models’ out-of-sample predictive capability. Basic familiarity with logistic regression is a prerequisite for this course.

Dr. Cetin Savkli, The Johns Hopkins University Applied Physics Laboratory
A Random Walk in the World of Graph Analysis
January 18, 2022
Link to recording: https://youtu.be/zviQM84BY0U

Abstract: Graph is a flexible data structure that facilitates fusion of disparate data sets and represent relationships. Applications of graphs have shown a steady growth with the development of internet, cyber, and social networks, presenting large graphs for which analysis remains a challenging problem. Graph analysis lives at the intersection of computer science, probability and statistics, and linear algebra, and presents a wealth of interesting problems of practical importance. In this talk we will provide a brief introduction to methods used analyze graphs to measure centrality of nodes and links, discover community structure, and characterize global properties of graphs.

Dr. Jim Wisnowski, Principal Consultant and Cofounder at Adsurgo, LLC
& Dr. Jim Simpson, Principal of JK Analytics
Statistical Methods for Verification & Validation and Adaptive Sampling in Modeling and Simulation
August 25, 2021

Abstract: Leadership has placed a high premium on analytically defensible results for M&S Verification and Validation. This mini-tutorial will provide an overview of relevant standard methods to establish equivalency in mean, variance, and distribution shape such as Two One-Sided Tests (TOST), K-S tests, Fisher’s Exact, and Fisher’s Combined Probability. We will also discuss more advanced methods such as the equality between model parameters in statistical emulators versus live tests, and equivalence of output curves (functional data analysis). Additionally, we introduce a new method for near real-time adaptive sampling that places the next set of M&S runs at boundary regions of high gradient in the responses to more efficiently characterize complex surfaces such as those seen in autonomous systems.

Dr. Robert Gramacy, Department of Statistics, Virginia Tech
A Practical Introduction to Gaussian Process Regression
May 13, 2021
Presentation materials can be found here: Surrogates.

Abstract: Gaussian process regression is ubiquitous in spatial statistics, machine learning, and the surrogate modeling of computer simulation experiments. Fortunately their prowess as accurate predictors, along with an appropriate quantification of uncertainty, does not derive from difficult-to-understand methodology and cumbersome implementation. We will cover the basics, and provide a practical tool-set ready to be put to work in diverse applications. The presentation will involve accessible slides authored in Rmarkdown, with reproducible examples spanning bespoke implementation to add-on packages.