ASA Connect

 View Only
Expand all | Collapse all

"Teaching reproducibility and responsible workflow" papers now available in the November, 2022 issue of JSDSE

  • 1.  "Teaching reproducibility and responsible workflow" papers now available in the November, 2022 issue of JSDSE

    Posted 11-18-2022 13:11
    Many new principles and standards have been developed to facilitate cultural changes in fostering reproducible research, but less so has been done in teaching. Articles in the November, 2022 issue of the American Statistical Association's open-access Journal of Statistics and Data Science Education (https://www.tandfonline.com/toc/ujse21/current) present how to integrate practices for achieving reproducibility into the teaching of data science and statistics. The 11 papers and accompanying editorial discuss how to teach reproducibility and responsible workflow from different perspectives: refreshing and organizing teaching materials, providing guidelines for student work, engaging students in editorial work, and revising curriculum at the programme level.
     
     "The growing importance of reproducibility and responsible workflow in the data science and statistics curriculum" (Nicholas J. Horton, Rohan Alexander, Aneta Piekut, Colin Rundel, https://doi.org/10.1080/26939169.2022.2141001) motivates the issue by describing how teaching the data analysis cycle requires knowledge of reproducibility and workflow that has not historically been at the center of statistics and data science education and advocates for the inclusion of these topics in the curriculum.

     "An invitation to teaching reproducible research: lessons from a symposium" (Richard Ball, Norm Medeiros, Nicholas W. Bussberg, and Aneta Piekut, https://www.tandfonline.com/doi/full/10.1080/26939169.2022.2099489) summarizes key messages from the symposium Project TIER held in 2021. The 10 talks showcased examples of students benefiting in multiple ways from teaching reproducible methods on top of the statistical training: improvement of skills in computation, data management, and documentation that are transferable for research jobs and beyond, gaining confidence in analytical and interpretive skills, and broadening their intellectual development.

      "Interdisciplinary approaches and strategies from research reproducibility 2020: educating for reproducibility" (Melissa L. Rethlefsen, Hannah F. Norton, Sarah L. Meyer, Katherine A. MacWilkinson, Plato L. Smith II, and Hao Ye, https://www.tandfonline.com/doi/full/10.1080/26939169.2022.2104767) reports on a virtual conference dedicated to teaching reproducible research. They thematically analyzed the conference content and identified trans/interdisciplinary themes, including lifelong learning, cultivating bottom-up change, "sneaking in" learning, just-in-time learning, targeting learners by career stage, learning by doing, learning how to learn, establishing communities of practice, librarians as interdisciplinary leaders, teamwork skills, rewards and incentives, and implementing top-down change, along with key lessons for each of them. 

     "Data science ethos lifecycle: interplay of ethical thinking and data science practice" (Margarita Boenig-Liptsin, Anissa Tanweer, and Ari Edmundson, https://www.tandfonline.com/doi/full/10.1080/26939169.2022.2089411) notes that data science is a part of the social world with the potential to significantly impact (for better or worse) individuals and communities. Instructors, learners, and researchers are encouraged to consider the ethical dimensions of their practice. The Data Science Ethos Lifecycle tool was created to facilitate reflection on how social context interplays with data science work and what might be social consequences of the final products. The authors conclude that workflow is only responsible if ethical reflections are present at each stage of research.

     "Opinionated practices for teaching reproducibility: motivation, guided Instruction and practice" (Joel Ostblom and Tiffany Timbers, https://www.tandfonline.com/doi/full/10.1080/26939169.2022.2074922) observes that while it is relatively easy to engage students of statistics and data science courses in data analysis/project tasks, as they are driven by curiosity to discover new patterns, it is more difficult to do so when teaching a reproducible workflow. The solution suggested in this paper is to work on student motivation.
     
     "Tools and Recommendations for Reproducible Teaching" (Mine Dogucu and Mine Çetinkaya-Rundel, https://www.tandfonline.com/doi/full/10.1080/26939169.2022.2138645), shares the premise that if our teaching materials (raw data, lecture slides, videos, exercises etc.) are clearly organized, workflow and links between various management systems are easy to follow, all materials are available via version control system and built in markdown notebooks, it lays an example to students on how to document and share their work, as well as professionally report it. 

     "Third Time's a Charm: A Tripartite Approach for Teaching Project Organization to Students" (Christina Mehta and Renee' Moore, https://www.tandfonline.com/doi/full/10.1080/26939169.2022.2118644) reflects on three interactions of a statistical course and how students are guided to collaborate. The foundation of successful collaboration is transparently and neatly organized data documentation - a transferable skill pointed by many contributions in this issue.
     
     "LUSTRE: An online data management and student project resource" (John Towse, Rob Davies, Ellie Ball, Rebecca James, Ben Gooding, and Matthew Ivory,  https://www.tandfonline.com/doi/full/10.1080/26939169.2022.2118645) describes a system to engage students with best practices for open research by allowing them to experience different phases of reproducible research. They describe the LUSTRE package which promotes good data management practices, enables the delivery of key concepts in open research, and organizes and showcases project work.

     "Teaching for Large-Scale Reproducibility Verification" (Lars Vilhuber, Hyuk Harry Son, Meredith Welch, David N. Wasser, and Michael Darisse, https://www.tandfonline.com/doi/full/10.1080/26939169.2022.2074582) describes an innovative pedagogical, research-led approach where students are involved in the editorial work of the journals published by the American Economic Association (AEA). Students check completeness of replication materials and computational reproducibility of the code. Students have a chance to work across many coding languages to understand the workflow. 

      "Collaborative Writing Workflows in the Data-Driven Classroom: A Conversation Starter" (Sara Stoudt, https://www.tandfonline.com/doi/full/10.1080/26939169.2022.2082602) reviews the use of reproducible tools (such as R Markdown and computational notebooks) to allow individual students to create reproducible research outputs, while noting that collaborative approaches are less often used. This is in stark contrast to how data science projects are done in real life. The paper discusses two workflow strategies that can be used in teaching reproducible research to students and that require students to delegate tasks (e.g., chunks of code), communicate to discuss changes, and integrate data.
     
      "A Journey from Wild to Textbook Data to Reproducibly Refresh the Wages Data from the National Longitudinal Survey of Youth Database" (Dewi Amaliah, Dianne Cook, Emi Tanaka, Kate Hyde, and Nicholas Tierney, https://www.tandfonline.com/doi/full/10.1080/26939169.2022.2094300) motivates the importance of preparing reproducible materials as a way to refresh teaching materials based on datasets which are often updated. This approach is attractive in part because it can serve as an example for reproducible standards expected in student work. 
     
     "Approachable case studies support learning and reproducibility in data science: An example from evolutionary biology" (Luna L. Sanchez Reyes and Emily Jane McTavish https://www.tandfonline.com/doi/full/10.1080/26939169.2022.2099487) 
    explores the question of how we communicate open access materials and how they relate to the real world outside narrow data science silos. They find that even if code and data are published online, language used in replication materials might be too complex to clearly understand it. They identify barriers in accessibility of research workflows and discuss how to make them more available to a general audience.
     
    Thanks to Aneta Piekut, Rohan Alexander, Colin Rundel, Micaela Parker, and Nicholas J. Horton for their work as guest editors.

    ------------------------------
    Nicholas Horton
    Beitzel Professor of Technology and Society (Statistics and Data Science)
    Northampton, MA United States
    ------------------------------


  • 2.  RE: "Teaching reproducibility and responsible workflow" papers now available in the November, 2022 issue of JSDSE

    Posted 11-19-2022 09:18
    Sorry for any complications accessing the papers. The closing parentheses should not be included in the links that were automatically generated by the ASA Community.

    Here's a clean link to the issue (https://www.tandfonline.com/toc/ujse21/current ) and the papers:

    Many new principles and standards have been developed to facilitate cultural changes in fostering reproducible research, but less so has been done in teaching. Articles in the November, 2022 issue of the American Statistical Association's open-access Journal of Statistics and Data Science Education (https://www.tandfonline.com/toc/ujse21/current ) present how to integrate practices for achieving reproducibility into the teaching of data science and statistics. The 11 papers and accompanying editorial discuss how to teach reproducibility and responsible workflow from different perspectives: refreshing and organizing teaching materials, providing guidelines for student work, engaging students in editorial work, and revising curriculum at the programme level.

    • "The growing importance of reproducibility and responsible workflow in the data science and statistics curriculum" (Nicholas J. Horton, Rohan Alexander, Aneta Piekut, Colin Rundel, https://doi.org/10.1080/26939169.2022.2141001 ) motivates the issue by describing how teaching the data analysis cycle requires knowledge of reproducibility and workflow that has not historically been at the center of statistics and data science education and advocates for the inclusion of these topics in the curriculum.

    • "An invitation to teaching reproducible research: lessons from a symposium" (Richard Ball, Norm Medeiros, Nicholas W. Bussberg, and Aneta Piekut, https://www.tandfonline.com/doi/full/10.1080/26939169.2022.2099489 ) summarizes key messages from the symposium Project TIER held in 2021. The 10 talks showcased examples of students benefiting in multiple ways from teaching reproducible methods on top of the statistical training: improvement of skills in computation, data management, and documentation that are transferable for research jobs and beyond, gaining confidence in analytical and interpretive skills, and broadening their intellectual development.

    • "Interdisciplinary approaches and strategies from research reproducibility 2020: educating for reproducibility" (Melissa L. Rethlefsen, Hannah F. Norton, Sarah L. Meyer, Katherine A. MacWilkinson, Plato L. Smith II, and Hao Ye, https://www.tandfonline.com/doi/full/10.1080/26939169.2022.2104767 ) reports on a virtual conference dedicated to teaching reproducible research. They thematically analyzed the conference content and identified trans/interdisciplinary themes, including lifelong learning, cultivating bottom-up change, "sneaking in" learning, just-in-time learning, targeting learners by career stage, learning by doing, learning how to learn, establishing communities of practice, librarians as interdisciplinary leaders, teamwork skills, rewards and incentives, and implementing top-down change, along with key lessons for each of them.

    • "Data science ethos lifecycle: interplay of ethical thinking and data science practice" (Margarita Boenig-Liptsin, Anissa Tanweer, and Ari Edmundson, https://www.tandfonline.com/doi/full/10.1080/26939169.2022.2089411 ) notes that data science is a part of the social world with the potential to significantly impact (for better or worse) individuals and communities. Instructors, learners, and researchers are encouraged to consider the ethical dimensions of their practice. The Data Science Ethos Lifecycle tool was created to facilitate reflection on how social context interplays with data science work and what might be social consequences of the final products. The authors conclude that workflow is only responsible if ethical reflections are present at each stage of research.

    • "Opinionated practices for teaching reproducibility: motivation, guided Instruction and practice" (Joel Ostblom and Tiffany Timbers, https://www.tandfonline.com/doi/full/10.1080/26939169.2022.2074922 ) observes that while it is relatively easy to engage students of statistics and data science courses in data analysis/project tasks, as they are driven by curiosity to discover new patterns, it is more difficult to do so when teaching a reproducible workflow. The solution suggested in this paper is to work on student motivation.

    • "Tools and Recommendations for Reproducible Teaching" (Mine Dogucu and Mine Çetinkaya-Rundel, https://www.tandfonline.com/doi/full/10.1080/26939169.2022.2138645 ), shares the premise that if our teaching materials (raw data, lecture slides, videos, exercises etc.) are clearly organized, workflow and links between various management systems are easy to follow, all materials are available via version control system and built in markdown notebooks, it lays an example to students on how to document and share their work, as well as professionally report it.

    • "Third Time's a Charm: A Tripartite Approach for Teaching Project Organization to Students" (Christina Mehta and Renee' Moore, https://www.tandfonline.com/doi/full/10.1080/26939169.2022.2118644 ) reflects on three interactions of a statistical course and how students are guided to collaborate. The foundation of successful collaboration is transparently and neatly organized data documentation - a transferable skill pointed by many contributions in this issue.

    • "LUSTRE: An online data management and student project resource" (John Towse, Rob Davies, Ellie Ball, Rebecca James, Ben Gooding, and Matthew Ivory, https://www.tandfonline.com/doi/full/10.1080/26939169.2022.2118645 ) describes a system to engage students with best practices for open research by allowing them to experience different phases of reproducible research. They describe the LUSTRE package which promotes good data management practices, enables the delivery of key concepts in open research, and organizes and showcases project work.

    • "Teaching for Large-Scale Reproducibility Verification" (Lars Vilhuber, Hyuk Harry Son, Meredith Welch, David N. Wasser, and Michael Darisse, https://www.tandfonline.com/doi/full/10.1080/26939169.2022.2074582 ) describes an innovative pedagogical, research-led approach where students are involved in the editorial work of the journals published by the American Economic Association (AEA). Students check completeness of replication materials and computational reproducibility of the code. Students have a chance to work across many coding languages to understand the workflow.

    • "Collaborative Writing Workflows in the Data-Driven Classroom: A Conversation Starter" (Sara Stoudt, https://www.tandfonline.com/doi/full/10.1080/26939169.2022.2082602 ) reviews the use of reproducible tools (such as R Markdown and computational notebooks) to allow individual students to create reproducible research outputs, while noting that collaborative approaches are less often used. This is in stark contrast to how data science projects are done in real life. The paper discusses two workflow strategies that can be used in teaching reproducible research to students and that require students to delegate tasks (e.g., chunks of code), communicate to discuss changes, and integrate data.

    • "A Journey from Wild to Textbook Data to Reproducibly Refresh the Wages Data from the National Longitudinal Survey of Youth Database" (Dewi Amaliah, Dianne Cook, Emi Tanaka, Kate Hyde, and Nicholas Tierney, https://www.tandfonline.com/doi/full/10.1080/26939169.2022.2094300 ) motivates the importance of preparing reproducible materials as a way to refresh teaching materials based on datasets which are often updated. This approach is attractive in part because it can serve as an example for reproducible standards expected in student work.

    • "Approachable case studies support learning and reproducibility in data science: An example from evolutionary biology" (Luna L. Sanchez Reyes and Emily Jane McTavish https://www.tandfonline.com/doi/full/10.1080/26939169.2022.2099487 )
    explores the question of how we communicate open access materials and how they relate to the real world outside narrow data science silos. They find that even if code and data are published online, language used in replication materials might be too complex to clearly understand it. They identify barriers in accessibility of research workflows and discuss how to make them more available to a general audience.

    Thanks to Aneta Piekut, Rohan Alexander, Colin Rundel, Micaela Parker, and Nicholas J. Horton for their work as guest editors.

    ------------------------------
    Nicholas Horton
    Beitzel Professor of Technology and Society (Statistics and Data Science)
    Northampton, MA United States
    ------------------------------