Qi Long, PhD

December 5, 2019 Webinar 

Distributed learning from multiple EHRs databases for predicting medical events 

Qi Long, PhD

Abstract:  

Electronic health records (EHRs) data offer great promises in personalized medicine. However, EHRs data also present significant analytical challenges due to their irregularity and complexity. For example, EHRs include data from multiple domains collected over time and include both structured and unstructured data. In addition, analyzing EHR data involves privacy issues and sharing such data across multiple institutions/sites may be infeasible. Building on a contextual embedding model, we propose a distributed learning approach to learn from multiple EHRs databases and predict multiple medical events simultaneously, which can handle both structured and unstructured data. We further augment the proposed approach with Differential Privacy to enhance privacy protection. Our numerical studies demonstrate that the proposed method can build predictive models in a distributed fashion with privacy protection and the resulting models achieve reasonable prediction accuracy compared with methods that use pooled data across all sites. Our algorithm, if integrated into EHR system as a decision support tool, has the potential to improve early detection and diagnosis of diseases which is known to be associated with better patient outcomes. This is joint work with Ziyi Li, Kirk Roberts, and Xiaoqian Jiang. 

 

Short Bio: 

Dr. Long is a Professor of Biostatistics and Director of the Biostatistics and Bioinformatics Core in the Abramson Cancer Center at the University of Pennsylvania Perelman School of Medicine. He is an elected Fellow of the American Statistical Association and an elected member of the International Statistical Institute. He is a Senior Editor for Cancer Research, and associate editor for several leading statistical journals. He is a standing member of the NIH Biostatistical Methods and Research Design (BMRD) Study Section. 

The thrust of his research is to develop statistical and machine learning methods for advancing precision medicine and population health. He has developed methods for the analysis of big biomedical data including -omics, electronic health records (EHRs), and mobile health (mHealth) data, causal inference, missing data, Bayesian modeling, and clinical trials. His methods research has been supported by NIH, PCORI, and NSF. In addition, he has provided leadership in biomedical research as Director of the Statistical and Data Coordinating Center for national research networks and large-scale multi-site clinical studies. He currently co-leads the Coordinating Center for the Pre-medical Cancer Immunotherapy Network for Canine Trials (PRECINCT), part of the NCI Cancer Moonshot Initiative.