Upstat 2024 Tutorial Descriptions

4:00-5:00pm                Tutorials

T1A: Introduction to Causal Inference

Teresa Gibson, PhD
Professor of Practice
School of Mathematics and Statistics
Rochester Institute of Technology

As the availability of real-world data expands, methods to analyze and decipher information gathered beyond controlled experiments continue to evolve. Causal Inference is a rapidly developing area that seeks to untangle cause and effect in non-experimental settings. These methodologies are used in a wide variety of industries including and not limited to marketing, healthcare, banking/finance, and manufacturing. In this tutorial we dive into the world of causal inference, delving into the potential outcomes framework, exploring the usefulness of directed acyclic graphs and two popular approaches to causal effects estimation: propensity scores and difference in differences. Using R, we'll conduct a hands-on exploration, applying these tools to real-world datasets. Get ready to learn more about rapidly-developing methods applied to real-world data to support data-driven decision-making.

 

T1B:  Websites/webservers and web graphics for beginners

Gregory Babbitt, PhD
Associate Professor
Gosnell School of Life Sciences
Rochester Institute of Technology

This tutorial will include a soft introduction to html/css/js with examples from our BIOL 230 course at RIT. I'll lead participants through the basics of website structure and language syntax using modern web templates as a starting point. We'll also discuss how client-side and server-side operations differ and demonstrate how to set up a local webserver (apache install) for emulating and testing server-side code on your own laptop. We'll also survey modern client-side capabilities as they apply to web-based graphics, data presentation, and compiled code.  

 

T1C: A gentle introduction to diffusion models in machine learning (Part 1)

Zi-Jia Gong
PhD Student in Mathematical Modelling
School of Mathematics and Statistics
Rochester Institute of Technology

Diffusion Models have demonstrated state-of-the-art results in image generation. A diffusion model uses a forward diffusion stochastic process to transform the data distribution to a known prior distribution by gradually adding noise. Additionally, it learns a parametrized reverse process from the data collected from the forward diffusion process. The reverse process can generate a new data point starting from a random noise drawn from the prior distribution. 

In the first section of this tutorial, we will delve into the mathematical details of Diffusion Models. The tutorial will include both discrete-time (Markov chains) and continuous-time diffusion formulation (Stochastic differential equations), as well as the derivation of the reverse processes and training objectives. Moreover, we review some interesting applications of diffusion models such as conditional generation and image editing. We will also address the primary limitation of diffusion models - slow sampling - and present techniques for accelerating this process, such as model distillation and improved numerical schemes.

This tutorial will provide a fundamental understanding of Diffusion Models to participants who are interested in this new exciting area.

 

 

5:00-6:00pm                Tutorials

T2A:        Community Detection in Complex Networks

Nishant Malik, PhD
Assistant Professor
School of Mathematics and Statistics
Rochester Institute of Technology

Many natural and social systems are organized as networks. A few well-known examples of networks include the animal brain, electrical power grids, the internet, online social networks such as Facebook and Twitter, the relationships between genes and diseases, collaboration and citation among scientists, trade among countries, and interactions between financial markets. Most of these networks exhibit complex structural properties, hence the name complex networks. Mathematical analysis of complex networks has led to many successes, such as improving our understanding of the human brain and developing innovative intervention and vaccination strategies to prevent the spread of diseases.

Numerous biological, social, and technological networks have modular structures: networks that consist of modules of nodes called communities, where the connectivity is dense within these communities. We will learn various algorithms for detecting community structures in networks during this tutorial. We will use Python's NetworkX package and apply these algorithms to several real-world data sets.

 

T2B: Introduction to basic bioinformatics concepts and databases for beginners

Gregory Babbitt, PhD
Associate Professor
Gosnell School of Life Sciences
Rochester Institute of Technology

This tutorial will include a soft introduction to the field of bioinformatics with examples from our BIOL 130 course at RIT.  We will learn about the types of bioinformatics data that is collected and where it is freely databased for users.  We will learn about how the biological processes of genetics/heredity, protein function and interaction, and molecular evolution affect variation in these types of data. We will demonstrate some popular methods and tools for analyzing bioinformatics (i.e. Protein Data Bank, NCBI GenBank, UCSF ChimeraX, KEGG database, MEGA)

 

T2C: A gentle introduction to diffusion models in machine learning (Part 2) 

Zi-Jia Gong
PhD Student in Mathematical Modelling
School of Mathematics and Statistics
Rochester Institute of Technology

Diffusion Models have demonstrated state-of-the-art results in image generation. A diffusion model uses a forward diffusion stochastic process to transform the data distribution to a known prior distribution by gradually adding noise. Additionally, it learns a parametrized reverse process from the data collected from the forward diffusion process. The reverse process can generate a new data point starting from a random noise drawn from the prior distribution. 

In the second section of this tutorial, we will focus on Google’s implementation of diffusion models in JAX. We will talk about: 

  • A short introduction to JAX, and a comparison between JAX and PyTorch,
  • Commonly used neural network architectures in diffusion models and their implementations in JAX,
  • The implementation of forward and reverse diffusion processes,
  • Training and checkpointing procedures,
  • Training and inference on multiple GPUs in parallel. 

This tutorial will provide a fundamental understanding of Diffusion Models to participants who are interested in this new exciting area.

 

T2D: Demystifying Tiny GPT: Hands-On Training with PyTorch (Part 1) Canceled

Bardh Rushiti
Co-founder @ AI Kosovo
Computer Vision & AI Engineer @ Calvary Robotics

Canceled

In an era where large language models (LLMs) are ubiquitously deployed across myriad applications, often there is little regard for the intricate mechanisms underpinning their operations. This workshop seeks to cut through the hyperbole by offering a straightforward, hands-on workshop of one of the foundational architectures at the heart of this AI revolution: the Transformer. The  session aims to demystify the Generative Pre-trained Transformer (GPT) through the practical lens of PyTorch implementation. Participants will be guided on a journey to construct a tiny version of a GPT model with a focus on language tasks. Initiating with a high level overview of the Transformer architecture—its emergence, its pivotal role in advancing machine learning, and its derivative models—the workshop will transition into a comprehensive workshop encompassing environment setup, data preprocessing, and the nitty-gritty of PyTorch-based model development. The workshop emphasizes a scientific approach, focusing not just on how to implement the transformer model, but on fostering an understanding of their mechanics, design rationale, and analysis. It challenges the trend of using LLMs as catch-all solutions, promoting a deep respect for their complexity and transformative potential.

Keywords: GPT, PyTorch, Natural Language Processing, AI Training, Model Implementation, Machine Learning Workshop

What to bring: Laptop

 

 

6:00-7:00pm                Tutorials

T3A: AntiCopyPaster: An Open-Source Ecosystem for Just-in-time Code Duplicates Extraction 

Mohamed Wiem Mkaouer
Assistant Professor
Software Engineering
Rochester Institute of Technology

Refactoring is a critical task in software maintenance and is usually performed to enforce best design practices, or to cope with design defects. Extract method refactoring is widely used for merging duplicate code into a single new method. Several studies attempted to recommend extract method refactoring opportunities through program slicing, program dependency graph, code modification analysis, structural similarity, and feature extraction. However, all approaches thus far are interfere with developer workflow, consider all refactoring suggestions in the entire projects without focusing on the context of development. To increase the adoption and usage rates of extract method refactoring, in this demo, we aim at investigating the effectiveness of machine learning algorithms in recommending extract method refactoring by maintaining the workflow of the developer, and then reporting on a user study that evaluates the proposed technique.

 

T3B: An Introduction To The Conformal Prediction Approach to Uncertainty Quantification

Ernest Fokoue, PhD
Professor
School of Mathematics and Statistics
Rochester Institute of Technology

🌟📊 Welcome to "An Introduction to the Conformal Prediction Approach to Uncertainty Quantification in Statistical Learning" tutorial, where we embark on an enlightening journey into the realm of predictive analytics! 🌟📊 In this captivating one-hour session, we invite you to delve into the fascinating world of uncertainty quantification through the innovative lens of Conformal Prediction. 🌟 Harnessing the power of Python and R, we'll traverse the intricate landscape of statistical learning with elegance and ease. 🐍💻 Whether you're a seasoned statistician or a budding data enthusiast, this tutorial promises to ignite your curiosity and illuminate your understanding. ✨ No need for advanced degrees or esoteric knowledge – just bring your laptop and a desire to explore. 🚀 Together, we'll unravel the mysteries of uncertainty, unraveling its enigmatic threads to reveal actionable insights and confident predictions. 💡 Join us for an engaging hour of discovery, where we'll demystify predictive regression analysis and unveil the principles of basic pattern recognition. 🌺🔍 Let's empower ourselves with the tools to navigate uncertainty with poise and precision.

Don't miss out on this enriching experience – reserve your seat now and embark on a transformative journey into the heart of statistical learning. See you there! 🌟🎓

 

T3C: Introduction to Research Computing at RIT

Benjamin Meyers, PhD
Research Computing
Rochester Institute of Technology

In this tutorial, you will be introduced to the Research Computing department at RIT and the services offered, with a focus on Research Computing's High-Performance Compute cluster. You will learn how to access the cluster and how to create/submit jobs with demonstrations. Bring your laptops.

 

T3D: Demystifying Tiny GPT: Hands-On Training with PyTorch (Part 2) Canceled

Bardh Rushiti
Co-founder @ AI Kosovo
Computer Vision & AI Engineer @ Calvary Robotics

Canceled

In an era where large language models (LLMs) are ubiquitously deployed across myriad applications, often there is little regard for the intricate mechanisms underpinning their operations. This workshop seeks to cut through the hyperbole by offering a straightforward, hands-on workshop of one of the foundational architectures at the heart of this AI revolution: the Transformer. The  session aims to demystify the Generative Pre-trained Transformer (GPT) through the practical lens of PyTorch implementation. Participants will be guided on a journey to construct a tiny version of a GPT model with a focus on language tasks. Initiating with a high level overview of the Transformer architecture—its emergence, its pivotal role in advancing machine learning, and its derivative models—the workshop will transition into a comprehensive workshop encompassing environment setup, data preprocessing, and the nitty-gritty of PyTorch-based model development. The workshop emphasizes a scientific approach, focusing not just on how to implement the transformer model, but on fostering an understanding of their mechanics, design rationale, and analysis. It challenges the trend of using LLMs as catch-all solutions, promoting a deep respect for their complexity and transformative potential.

Keywords: GPT, PyTorch, Natural Language Processing, AI Training, Model Implementation, Machine Learning Workshop

What to bring: Laptop