Dear colleagues,
We need a volunteer for discussant at our Topic contributed session in Toronto. Our discussant had an unexpected urgent family matter and is unable to attend the JSM in Toronto.
- The Session description in the online program
https://ww2.aievolution.com/JSMAnnual/index.cfm?do=ev.viewEv&ev=2268
and cut/pasted
Protected Health Information Privacy, and Statistical Disclosure Considerations for Projects using HIPAA
to be filled - discussant
Thursday, Aug 10: 10:30 AM - 12:20 PM
1713
Topic-Contributed Paper Session
Presentations
Privacy Enhancing Technologies (PETs) are a broad set of technologies that allow data owners, data scientists and statisticians to join and disseminate information with theoretical and computational guarantees. They limit who and to what extent can process personally or entity identifiable information. Famously in the community, these types of technologies have been able to perform computations such as the private set intersection of datasets and provable privacy disclosure controls using differential privacy, as has been seen by the US Census 2020. However, the range of PETs has never been so broad and accessible as they have in recent years. This year these technologies have been recommended and advocated by the likes of the IEEE, Royal Society and United Nations. In this presentation, we will cover a tour of the PETs landscape covering input and output privacy paradigms, the technology maturity and considerations for those planning to use PETs for statistics. Further, we will demonstrate accessible ways to start leveraging PETs in day-to-day statistical analysis.
Statistical methods for probabilistic fuzzy matching enables the de-identification of private healthcare information with relative accuracy. This presentation discusses the various algorithms for de-identification and compares the minimum and data types needed to achieve de-identification. Tools for circumvention are examined with respect to efficiency and counter activity. Machine learning versus probabilistic algorithms also are debated.
Speaker
Jimmy Efird, Boston VA Cooperative Studies Program Coordinating Center
Accurately estimating personalized treatment effects within a study site (eg, a hospital) has been challenging due to limited sample size. Furthermore, privacy considerations and lack of resources prevent a site from leveraging subject-level data from other sites. We propose a tree-based model averaging approach to improve the estimation accuracy of conditional average treatment effects (CATE) at a target site by leveraging models derived from other potentially heterogeneous sites, without them sharing subject-level data. To our best knowledge, there is no established model averaging approach for distributed data with a focus on improving the estimation of treatment effects. Specifically, under distributed data networks, our framework provides an interpretable tree-based ensemble of CATE estimators that joins models across study sites, while actively modeling the heterogeneity in data sources through site partitioning. The performance of this approach is demonstrated by a real-world study of the causal effects of oxygen therapy on hospital survival rate and backed up by comprehensive simulation results.
Speaker
Lu Tang, University of Pittsburgh
I report my key finding - discovery of a mis-use of Terms of Use (TOU) of at least two Crown Jewel Federal Databasse to produce a public domain, not-password protected companion datasets resulting in an unprecedented first-ever publication of a "proof of concept" (POC) and algorithm for de-anonymizing individual data that appears in a prestigious peer reviewed economics journal. The algorithm and POC use characteristics of unique oncology clinical trials recorded in clinicaltrials.gov that create unprecedented risk of de-anonymization of forty (40) unique patients using "big data" -SEER (Surveillance and Epidemiology End Results) patient level tumor data and vice versa, use SEER to de-anonymize unique patients in "big data" clinicaltrials.gov. SEER is anonymized data. My finding of instances of patient level data in clinicaltrials.gov appears to be completely unexpected and may possibly be first ever reported to privacy experts at both SEER and clinicaltrials.gov.. My original report appears in the September 2022 issue of the AMSTAT (american statistical association) newsletter "Finding the De-Anonymization Needle in the SEER Haystack".
------------------------------
Chris Barker, Ph.D.
2023 Chair Statistical Consulting Section
Consultant and
Adjunct Associate Professor of Biostatistics
www.barkerstats.com---
"In composition you have all the time you want to decide what to say in 15 seconds, in improvisation you have 15 seconds."
-Steve Lacy
------------------------------