From Big Data to Better Insights: A Primer on using Machine Learning Methods in a Data-Centric World
Instructor: Trent D. Buskirk, Ph.D., Old Dominion University
Full Day Course (can be adapted to half day as well)
Description:
The amount of data generated as a by-product in society is growing fast including data from satellites, sensors, transactions, social media, smartphones and even thermostats, just to name a few. Such data are often referred to as “big data”; and can be leveraged to create value in different areas such as health, crime prevention, commerce and fraud detection, among others. An emerging practice in many areas is to append or link big data sources with more specific and smaller scale sources that often contain much more limited, but potentially richer information. This practice has been used for some time by survey researchers and social and political scientists to construct sampling frames by appending auxiliary information that is often not directly available on an existing frame, but can be obtained from an external source. The use of big data in other industries and sectors is also growing at a fast pace. But having the data alone does not guarantee that insights or understanding will follow. Careful consideration of data structure, type and volume are needed along with methods that can scale to accommodate the many cases or many variables scenario. This course offers participants a broad overview of big data to allow participants to understand the need for alternate methods to analyze and visualize such data and then introduces the machine learning framework. We will discuss briefly the difference between inference and prediction within the statistical machine learning paradigm as well as the difference between supervised and unsupervised machine learning methods. The course will close with an intuitive, accessible yet rigorous discussion of four of the most common machine learning methods that every analyst should understand in the era of big data including k-means clustering, k nearest neighbors, tree-based methods and random forest models. The machine learning methods will be illustrated with examples that can be reproduced in R. Time permitting we will also highlight the Rattle package in R that provides an intuitive and accessible graphic user interface for reproducible specification of a broad assortment of machine learning models within the R environment.
About the instructor:
Trent D. Buskirk, Ph.D. is a Professor and Data Science Fellow in the New School of Data Science at Old Dominion University. Prior to this appointment, Trent was the Novak Family Distinguished Professor of Data Science and Chair of the Applied Statistics and Operations Research Department at Bowling Green State University. Dr. Buskirk is a Fellow of the American Statistical Association and his research interests include big data quality, recruitment methods through social media, the use of big data and machine learning methods for health, social and survey science design and analysis, mobile and smartphone survey designs and in methods for calibrating and weighting nonprobability samples and fairness in AI models and interpretable ML methods. Recently, Trent served as the Conference Chair for AAPOR in 2018 and is currently part of the scientific committee for the BigSurv23 conference. Dr. Buskirk is also an outgoing Associate Editor for Methods for the Journal of Survey Statistics and Methodology. When Trent is not geeking out over data science, big data or survey methodology, you can find him playing a competitive game of Pickleball!
Back to Traveling Course page