Big Data Sessions at the 2015 Joint Statistical Meetings

By Steve Pierson posted 06-02-2015 14:59


Following up on my 2013 and 2014 blog entries on the same topic, here are the 2015 sessions that mention "Big Data" or "Data Science" in the title or the abstract, or seems strongly related to the topic. 

The first list is based on titles that contain either term (using Basic Search on the JSM 2015 Online Program.) The second list is where either appears in the abstract of a talk or session (using Abstract Keyword Search). Please let me know of any I missed or add them through the comment section below. 

Also, after this blog entry appeared in 2013, someone pointed out the many more sessions at JSM on Causal Inference. Please point out similar omissions this year!

[6/15/15 update: See also

8/25 update: there were also many poster presentations that involved Big Data, like this one

See other ASA Science Policy blog entries. For ASA science policy updates, follow @ASA_SciPol on Twitter. 

Sessions or Continuing Ed Courses with Big Data or Data Science in the Title: 

 6  Scaling up Response Surface Models for Big Geostatistical and Computer Simulation Data  Sun, 8/9 2:00 PM to 3:50 PM
Room: TCC-204


Big Data in Seattle

Sun, 8/9 2:00 PM to 3:50 PM
Room: CC-4C3

 17  Theory and Methods for Massive Spatial Data  Sun, 8/9 2:00 PM to 3:50 PM
Room: CC-212


Computational Methods for Big Data and Visualization Problems

Sun, 8/9 8:30 PM to 9:15 PM

 105  Statistical Methods for Big Genomic Data Analysis  Mon, 8/10 8:30 AM to 10:20 AM
Room: CC-607


Big Data in the Environment

Mon, 8/10 10:30 AM to 12:20 PM
Room: CC-2A

 157  Big Bayes: Scalable Algorithms and Architectures   Mon, 8/10 10:30 AM to 12:20 PM
Room: CC-606


Big Data: Modeling, Tools, Analytics, and Training

Mon, 8/10 10:30 AM to 12:20 PM
Room: CC-606


Big Data of Customer Analytics in the Era of Social Media

Mon, 8/10 2:00 PM to 3:50 PM
Room: CC-203


Crowd-Sourcing Big Data from Smartphone Apps for Transportation Research: Role of Statistics and Challenges

Mon, 8/10 2:00 PM to 3:50 PM
Room: CC-206


Applications in Big Data

Mon, 8/10 2:00 PM to 3:50 PM
Room: TCC-202


Complex Data, Inhomogeneous Data and Big Data

Tue, 8/11 8:30 AM to 10:20 AM
Room: CC-206

 294  The Statistics Identity Crisis: Are We Really Data Scientists?  Tue, 8/11 8:30 AM to 10:20 AM
Room: CC-609


Modern Inferential Methods for Big Data Analysis

Tue, 8/11 10:30 AM to 12:20 PM
Room: CC-606


The Fifth "V" in Big Data Is *Variables*

Wed, 8/12 8:30 AM to 10:20 AM
Room: CC-4C4


Causal Inference Meets Big Data

Wed, 8/12 8:30 AM to 10:20 AM
Room: CC-307


The Current Landscape of Business Analytics and Data Science at Higher Education Institutions: Who Is Teaching What

Wed, 8/12 8:30 AM to 10:20 AM
Room: CC-206


Novel Algorithms for Big Data Analytics

Wed, 8/12 10:30 AM to 12:20 PM
Room: CC-2B


Making Better Decisions with Data Science

Wed, 8/12 10:30 AM to 12:20 PM
Room: CC-4C4


Technology and Big Data in the Classroom

Wed, 8/12 10:30 AM to 12:20 PM
Room: CC-211


Big Data Issues in Biosciences

Wed, 8/12 2:00 PM to 3:50 PM
Room: CC-4C2


!Mining Big Data in Computational Neuroscience: Top-Down Methods.

Wed, 8/12 2:00 PM to 3:50 PM
Room: CC-201


Nonparametric Methods for Big Data, Empirical Likelihood and Addiitive Model

Wed, 8/12 2:00 PM to 3:50 PM
Room: CC-210


Enter a Data Science Competition. You Don't Need to Be an Expert! (ADDED FEE)

Wed, 8/12 8:00 AM to 9:45 AM

 623  Statistical and Graphical Challenges in Analyzing Big and Complex Neuroimaging Data  Thu, 8/13 8:30 AM to 10:20 AM
Room: CC-206


Big Data Techniques for Survey Data Integration

Thu, 8/13 8:30 AM to 10:20 AM
Room: CC-4C1


Sessions with Big Data in the Abstract: There are 79 of these and probably include talks from the session listed above

Sunday, 08/09/2015
The Undergraduate Curriculum of the Future 
Johanna Hardin, Pomona College 
2:05 PM 

A Subsampled Double Bootstrap for Massive Data 
Srijan Sengupta, University of Illinois at Urbana-Champaign; Stanislav Volgushev, Ruhr University Bochum; Xiaofeng Shao, University of Illinois at Urbana-Champaign 
2:05 PM 

Exploratory Data Analysis in Observational Data Utilizing Machine Learning Based Approaches 
Andrew Bate, Pfizer Inc. 
2:05 PM 

A Big Data Approach for Integrative Analysis of Two Different High-Throughput Genomic Data Types 
Hongkai Ji, Johns Hopkins Bloomberg School of Public Health; Weiqiang Zhou, Johns Hopkins Bloomberg School of Public Health; Bing He, Johns Hopkins Bloomberg School of Public Health 
3:05 PM 

The Truth Is Out There. But How Do We Dig it Out? 
Mikhail Traskin, Amazon 
3:05 PM 

Applying Pattern-Mixture Models for Estimation from Multiple Data Sources 
Jeffrey Gonzalez, Bureau of Labor Statistics; John L. Eltinge, Bureau of Labor Statistics 
4:35 PM 

Speeding up Neighborhood Searches in Local Gaussian Process Fitting of Large-Scale Computer Experiments 
Ben Haaland, Georgia Tech, ISyE; Chih-Li Sung, Georgia Tech, ISyE 
5:05 PM 

Intro Stats in the 21st Century 
Richard De Veaux, Williams College 
5:05 PM 

Shrinkage Priors for Bayesian Learning from High Dimesional Genetics Data 
Anjishnu Banerjee, Medical College of Wisconsin 
5:05 PM 

Monday, 08/10/2015
Data Science vs. Statistics: What's the Difference? 
Ronald Fricker, Naval Postgraduate School 

Big Data for the Social Sciences 
Frauke Kreuter, Joint Program in Survey Methodology 

Examples of Overselling and Under-Applying Big Data 
Kathryn Hall 

Novel Application of Statistical Tools for Big-Data Analyzes of Solar Physics 
Siavoush Mohammadi, Infotrek; Lars K.S. Daldorff, University of Michigan/NASA GSFC 

Innovative Ways the IRS Is Using Administrative Records 
Tamara Rib, IRS; Barry Johnson, IRS 
8:35 AM 

Novel Application of Statistical Tools for Big-Data Analyzes of Solar Physics 
Siavoush Mohammadi, Infotrek; Lars K.S. Daldorff, University of Michigan/NASA GSFC 
8:35 AM 

Patient Privacy, Big Data, and PoLoR: Using an Old Tool for New Challenges 
Paramita Saha Chaudhuri, McGill University 
8:55 AM 

Small Area Estimation for High-Dimensional Multivariate Spatio-Temporal Count Data 
Jonathan R. Bradley, University of Missouri; Scott H. Holan, University of Missouri; Christopher K. Wikle, University of Missouri 
9:00 AM 

Data Privacy in Biomedical Research and Practice in the Era of Big Data 
Aleksandra B. Slavkovic, Penn State 
9:15 AM 

Bridging Density Functional Theory and Big Data Analytics with Applications 
Henry Lu, National Chiao Tung University; Chien-Chang Chen, NCTU; Hung-Hui Juan, NCTU; Meng-Yuan Tsai, NCTU 
10:05 AM 

Computational Challenges with Big Environmental Data 
Marc Genton, KAUST 
10:35 AM 

Stroke Localization and Association with Health Outcomes Using Clinical CT Images 
Ciprian Crainiceanu, The Johns Hopkins University 
10:35 AM 

Management, Modeling and Analytic Challenges of Big Biomedical Data 
Ivo D. Dinov, University of Michigan 
10:35 AM 

Important Features PCA for High-Dimensional Clustering 
Wanjie Wang, The Wharton School; Jiashun Jin, Carnegie Mellon University 
10:50 AM 

ESPALIERS: a Visualization Method for Big Data 
Robert Robinson, Institute for Systems Biology; Gustavo Glusman, Institute for Systems Biology; Joseph G. Vockley, Inova Translational Medicine Institute; John E. Niederhuber, Inova Translational Medicine Institute; Greg Eley, Scimentis, LLC 
10:55 AM 

Interactive Graphics for High-Dimensional Genetic Data 
Karl W. Broman, University of Wisconsin - Madison 
11:20 AM 

A General Approach to Variable Section Using Bayesian Nonparametric Models 
Robert E. McCulloch, The University of Chicago 
11:25 AM 

Big Data Services: Globus Online, Galaxy, GridFTP 
Ravi Madduri, The University of Chicago 
11:35 AM 

On Data Parallelism and Model Parallelism for Large Scale Machine Learning 
Eric Xing, Carnegie Mellon University 
11:50 AM 

Recent Trends in Large Scale Data Intensive Systems 
Barzan Mozafari, University of Michigan 
11:55 AM 

A Hierarchical Nonparametric Bayesian Model That Integrates Multiple Sources of Lifetime Information to Model Large-Scale System Reliability 
Richard Warr; Brandon Greenwell, AFIT 
12:05 PM 

Big Data and Customer Analytics in the Era of Social Media 
David Stodder, TDWI 
2:05 PM 

Crowd-Sourcing Big Data from Smartphone Apps for Transportation Research: Role of Statistics and Challenges 
Feng Guo, Virginia Tech; Arash Mirzaei, North Central Texas Council of Governments; Elaine Murakami, Federal Highway Administration ; Bianica Pires, Virginia Tech; Tianjia Tang, Federal Highway Administration 
2:05 PM 

Machine Learning for Machine Data 
Sou-Cheng Choi, NORC at the University of Chicago 
2:50 PM 

Tuesday, 08/11/2015
Managing Analytic Projects: What Works and What Doesn't 
Chuck Kincaid, Experis Business Analytics 

Bayes and Big Data 
Steven Scott, Google 

The Citation Pattern for Business and Statistics Journals: Changes in the 21st Century 
Mary Whiteside, The University of Texas at Arlington; Mark Eakin, The University of Texas at Arlington; Sridhar Nerur, The University of Texas at Arlington 

What Can Be the Extent of Contributions of Statistical Sciences to Cyber-Risk and CLOUD Computing Domain in a Security- and Privacy-Conscious World? 
Mehmet Sahinoglu, Auburn University 

Big Data and the Social Sciences 
Seth Stephens-Davidowitz, The New York Times/Social Science Research Council 

Sparse Generalized PCA for Selectable High-Dimensional Analysis 
Qiaoya Zhang; Yiyuan She, Florida State University; M. Ross Kunz, Idaho National Laboratory 

Vital Collaborations Among Academia, Industry and Government 
Dongseok Choi, Oregon Health & Science University; John E. Kolassa, Rutgers University; Mani Lakshminarayanan , Pfizer Inc.; Barry D. Nussbaum , EPA; A. James O'Malley, Dartmouth College; Wei Shen , Eli Lilly and Company 
8:30 AM 

Heterogeneous Data Analysis, Based on HDLSS Asymptotics 
James Stephen Marron, The University of North Carolina 
8:35 AM 

"Am I a Data Scientist?": The Applied Statistics Student's Identity Crisis 
Alyssa Frazee, Stripe 
8:35 AM 

Phylogenetic Experimental Design in the Era of Big Data 
Jeffrey P. Townsend, Yale University 
8:35 AM 

Sparse Generalized PCA for Selectable High-Dimensional Analysis 
Qiaoya Zhang; Yiyuan She, Florida State University; M. Ross Kunz, Idaho National Laboratory 
8:45 AM 

Large Scale Visual Exploration of Radiological and Nuclear Risk Assessment Methods 
Landon Sego, Pacific Northwest National Laboratory; Daniel Fortin, Pacific Northwest National Laboratory; Robert Brigantic, PNNL 
8:50 AM 

Big Data Approaches for Clinical RNA Sequencing (RNA-Seq) 
Shihao Shen, UCLA 
8:50 AM 

Efficient Penalty Search for Multiple Changepoint Detection in Big Data 
Kaylea Haynes, Lancaster University; Idris Eckley, University of Lancaster; Paul Fearnhead, Lancaster University 
8:55 AM 

Examining Model Fit for Logistic Regression on Large Data Sets 
Todd Connelly 
9:20 AM 

Application of ADMM Method for Large Scale Statistical Models 
Ganesh Subramaniam, AT&T Labs Research; Ravi Varadhan, The Johns Hopkins University; Todd Larchuk, AT&T Labs Research; Huitong Qiu, The Johns Hopkins University 
9:20 AM 

Evaluating Data Science Contributions in Teaching and Research 
Lance Waller, Emory University 
9:25 AM 

Bayesian Melding of the Dead-Reckoned Path and GPS Measurements for an Accurate and High-Resolution Path of Marine Mammals 
Yang Liu, The University of British Columbia; Brian C. Battaile, The University of British Columbia Marine Mammal Research Unit; James Zidek, The University of British Columbia; Andrew W. Trites, The University of British Columbia Marine Mammal Research Unit 
9:35 AM 

The Army Study to Assess Risk and Resilience in Servicemembers (Army STARRS): Statistical Designs and Methods for Predicting Non-Fatal Suicidal Behaviors 
Tzu-Cheg Kao, Uniformed Services University of the Health Sciences; Steven Heeringa, University of Michigan Institute for Social Research; Alan Zaslavsky, Harvard Medical School; James Naifeh, CSTS, USUHS; Pablo Aliaga, CSTS, USUHS; Patti Vegella, CSTS, USUHS; Tsz Ng, CSTS, USUHS; Bailey Zhang, CSTS, USUHS; Christina Buckley, CSTS, USUHS; Carol Fullerton, CSTS, USUHS; Gary Wynn, CSTS, USUHS; James McCarroll, CSTS, USUHS; Nancy Sampson, Harvard Medical School; Lisa Colpe, NIH/NIMH; Michael Schoenbaum, NIH/NIMH; Kenneth Cox, U.S. Army Public Health Command; Ronald Kessler, Harvard Medical School; Murray Stein, UC San Diego/VA San Diego Healthcare System; Robert Ursano, CSTS, USUHS 
9:35 AM 

Tracking Evolution in Text Data Streams via Online Density-Based Clustering 
Avory Bryant, Naval Surface Warfare Center 
9:35 AM 

Variable Selection Methods for Big Data-A Comparative Study 
Jun Liu; Xuejing Mao, AT&T 
9:35 AM 

An Efficient GLS Algorithm for Periodic Regression with Autoregressive Errors 
Jaechoul Lee, Boise State University; Anthony Dini, Boise State University; William Negri, Boise State University 
9:35 AM 

Augmenting Traditional Estimation with Non-Designed Data: Application to the US Unemployment Rate 
Robert Montgomery, NORC at the University of Chicago; Martin Barron, NORC at the University of Chicago; Nicki Dunnavant, NORC at the University of Chicago; Yongheng Lin, NORC at the University of Chicago; Ilana Ventura, NORC at the University of Chicago 
12:05 PM 

Messy Data: Teaching Students Early on About the Realities of Data 
Ann Cannon, Cornell College 
2:25 PM 

Fusion Learning by Individual-To-Clique (FLIC): Efficient Approach to Enhancing Individual Inference Through Adaptive Combination of Confidence Distributions 
Minge Xie, Rutgers University; Regina Y. Liu, Rutgers University; Jieli Shen, Rutgers University 
2:55 PM 

Issues in Methodological Strategies for Marginal Structural Models with Large Data Sets 
Bret Zeldow; Jason Roy, University of Pennsylvania 
3:20 PM 

Wednesday, 08/12/2015
A Statistician's Journey to Big Data 
James Hess 

What Are the Statistical Challenges of Big Data Science? 
Kaiser Fung, New York University 

Designing Undergraduate Programs in Business Analytics and Data Science 
Amy L. Phelps, Duquesne University; Diane Fisher, University of Louisiana at Lafayette 

Total Survey Error: Implications for Big Data 
Paul Biemer, RTI International/The University of North Carolina at Chapel Hill 

Climate Changes and Agricultural Production - a Big Data Analysis Approach 
Hsi-Guang Sung, Microsoft; Elva Chen, Santa Clara Univeristy 

The Current Landscape of Business Analytics and Data Science at Higher Education Institutions: Who Is Teaching What 
Amy L. Phelps, Duquesne University; Kathryn Szabat, LaSalle University; Billie Anderson, Ferris State University; Jeffrey Camm, University of Cincinnati; Aric LaBarr, North Carolina State University 
8:35 AM 

Creating Collaboration Around All Data Scientists for Better Business Decisions 
Celeste Fralick, Intel Corporation; Rita R. Chattopadhyay, Intel Corporation; Paula Greve, Intel Corporation; Genetha Gray, Intel Corporation 
8:35 AM 

Experimentation at Scale: Lessons from Production at Etsy 
Hilary Parker, Etsy 
8:35 AM 

Unravelling Bias in Online Experimentation 
Chris Harland, Microsoft 
8:55 AM 

A Computationally Efficient Method for the Analysis of Big Survival Data 
Kevin He, University of Michigan; Yi Li, University of Michigan; Yanming Li, University of Michigan; Ji Zhu, University of Michigan 
9:00 AM 

An Unbiased and Scalable Monte Carlo Method for Bayesian Inference for Big Data 
Murray Pollock, University of Warwick; Paul Fearnhead, Lancaster University; Adam Michael Johansen, University of Warwick; Gareth O. Roberts, University of Warwick 
9:00 AM 

How Credible Are Observational Estimates of Causal Effects from 'Big Data' 
Eytan Bakshy, Facebook; Dean Eckles, Facebook 
9:15 AM 

Forecasting with Big Data: Finding and Using Behavioral "sensors" to Predict the Future 
Sean Taylor, Facebook; Alex Peysakhovich, Facebook 
9:35 AM 

Collaborative Data Science with CoLaboratory 
Kayur Patel, Google 
10:35 AM 

Estimating the Degree of Activity of Jumps of a Discretely Observed Semimartingale 

10:35 AM 

Keeping it Real - Using Big Data and Interactive Visualization Tools in the Classroom 
Mia Stephens, SAS Institute; Rob Carver, Stonehill College 
10:35 AM 

What's in a Name: the Evolution of Statistical Terms Such as Analytics, Big Data, and Data Science 
John McKenzie, Babson College 
10:50 AM 

Case-Specific Random Forests for Big Data Prediction 
Dan Nettleton, Iowa State University 
11:00 AM 

Between Data Cleaning and Inference: Pre-Averaging and Robust Estimators of the Efficient Price 
Per A. Mykland, The University of Chicago; Lan Zhang, University of Illinois at Chicago 
11:05 AM 

Generative Modeling of Convolutional Neural Networks 
Ying Nian Wu, UCLA 
11:25 AM 

Regression Estimation Diagnostics Measures for High-Dimensional Regression 
Yanjia Yu, University of Minnesota, Twin Cities; Yuhong Yang, University of Minnesota, Twin Cities 
11:35 AM 

A Bootstrap Metropolis-Hastings Algorithm for Bayesian Analysis of Big Data 
Faming Liang, University of Florida; Jinsu Kim, Texas A&M University; Qifan Song, Purdue University 
11:50 AM 

Application of Principal Component Analysis of Distribution in Sport Analytic 
Sun Makosso-Kallyth, Degroote-Pain Centre-McMaster University; Brahim Brahim, BigData Visualizations 
11:50 AM 

Teaching Statistical Computing Leveraging the Github Ecosystem 
Colin Rundel 
12:05 PM 

Introduction to Data Science: An Interdisciplinary Course for Undergraduates 
Alyson Wilson, North Carolina State University 
2:05 PM 

Challenges of A/B Testing at Scale 
Ya Xu, LinkedIn 
2:05 PM 

Casual Inference for Marketing Program Evaluation 
Fei Wang 
2:20 PM 

Product Support to Product Innovation: The Role of Analysts at Data Driven Companies 
McCall McIntyre, Simulmedia 
2:45 PM 

Assessing the Use of Google Trends Search Query Data to Forecast Number of Nonresident Hotel Registrations in Puerto Rico 
Roberto Rivera, University of Puerto Rico at Mayaguez 
3:05 PM 

Likelihood Estimation of Large Species Trees Using the Coalescent Process 
Arindam RoyChoudhury, Columbia University 
3:05 PM 

Kaggle as a Course 
Michael Schuckers, St. Lawrence University 
3:20 PM 

Thursday, 08/13/2015
Survey Data, Big Data, State Space Models and Official Statistics 
Siu-Ming Tam, Australian Bureau of Statistics 
8:35 AM 

Feature Selection Using Regularized Trees in Online Fraud Detection 
Nitin Sharma, PayPal, Inc. 
8:35 AM 

Crop Acreage Prediction Combining Several Sources of Information 
Jae-kwang Kim, Iowa State University; Zhonglei Wang, Iowa State University 
9:00 AM 

A New Generalized Heterogeneous Data Model (GHDM) to Jointly Model Mixed Types of Dependent Variables 
Chandra R. Bhat, The University of Texas at Austin 
9:15 AM 

Perils and Solutions for Comparative Effectiveness Research in Massive Observational Databases 
Marc A. Suchard, UCLA 
10:35 AM 

Methodological Developments in Growth Modeling 
Stef van Buuren, TNO 
11:35 AM 

Learning with Social Networks: a Data Mining Perspective 
Umashanger Thayasivam, Rowan University 
11:55 AM