First year of the Software Engineering working group

Regulatory-Industry Statistics Workshop 2023

Daniel Sabanés Bové (Roche) and Ya Wang (Gilead) on behalf of the working group

2023-09-29

Introducing the WG

Software Engineering Working Group

Founded last year:

  • When: 19 August 2022 - just celebrated our 1 year birthday!
  • Where: American Statistical Association (ASA) Biopharmaceutical Section (BIOP)
  • Who: 11 statisticians from 7 pharma companies developing statistical software
  • New name: openstatsware

Why a new WG?

  • Started with specific R-package project (more below)
  • Makes sense to stay together as a group also for other package projects
  • New focus on good engineering practices and collaborative work
  • Importance of reliable software for statistical analysis can not be underestimated
  • Be rooted in the biostatistics community (rather than statistical programming)

But there are other WGs?

  • Comparing Analysis Method Implementations in Software (CAMIS) of PhUSE 🌎
  • Application and Implementation of Methodologies in Statistics (AIMS) Special Interest Group (SIG) of PSI 🌎
  • R Submission Working Group of R Consortium 🌎
  • R Tables for Regulatory Submissions Working Group of R Consortium 🌎
  • R Validation Hub 🌎
  • R Repositories WG of R Consortium 🌎

WG Objectives

  • Primary:
    • Engineer R packages that implement important statistical methods
    • … to fill in gaps in the open-source statistical software landscape
    • … focusing on what is needed for biopharmaceutical applications
  • Secondary:
    • Develop and disseminate best practices for engineering high-quality open-source statistical software
  • By actively doing the statistical engineering work together, we align on best practices and can communicate these to others

Relation to the validation topic

  • Validation is on every one’s mind when talking about open source in Pharma:
    • “Establishing documented evidence which provides a high degree of assurance (accuracy) that a specific process consistently (reproducibility) produces a product meeting its predetermined specifications (traceability) and quality attributes.”
  • The R Validation Hub is a collaboration to support the adoption of R within a biopharmaceutical regulatory setting
    • Develops tools that supports risk assessment of R packages

Relation to the validation topic (cont’d)

  • Base R and recommended packages are developed by the R Foundation, developed using best practices.
  • Our working group is spreading the word about best development practices such that contributed packages can achieve the same level of quality, hence aiding subsequent validation of them.
  • And we develop specific R packages with these best practices that can thus be easily validated.

Members and meetings

  • Currently 42 members
    • new members are welcome! (incl. academia/regulators/etc.)
  • Currently 36 organizations
    • Alpha Stats Ltd., argenx, AstraZeneca, BASF, Bayer, Berry Consultants, BMS, Boehringer Ingelheim, CIMS Global, Daiichi Sankyo, Eli Lilly, Finc Research, Genentech, Gilead Sciences, GSK, ICON, Independent, Jumping Rivers, MaxisIT, Merck, Merck KGaA, Mirum, MSD, Novartis, Novo Nordisk, Pfizer, Priothera, RCONIS, Red Door Analytics, Regeneron, Roche, RPACT, Sanofi, Stealth Bio, The George Institute for Global Health, UCB
  • Meet every 2 weeks

Workstreams

  • Mixed Models for Repeated Measures (MMRM) 🌎
    • Develop mmrm (see below) to use frequentist inference in MMRM
  • Bayesian MMRM 🌎
    • Develop brms.mmrm (see below)
  • Health Technology Assessment 🌎
    • Develop open-source R tools to support HTA dossier submission across various countries, particularly the topics with unmet needs in R implementation and/or related to EUnetHTA
  • Note: Also “just” contributing to workstreams is great!

Achievements in the first year

New R packages released to CRAN

  • mmrm
    • R package for frequentist inference in MMRM, based on TMB (which provides automatic differentiation in C++ and R frontend)
    • See documentation 🌎
    • Easiest to install from CRAN 🌎
  • brms.mmrm
    • R package for Bayesian inference in MMRM, based on brms (as Stan frontend for HMC sampling)
    • See documentation 🌎
    • Easiest to install from CRAN 🌎

Why was the MMRM topic important?

  • MMRM is a popular analysis method for longitudinal continuous outcomes in randomized clinical trials
  • No tailored R package with sufficient capabilities/reliability
  • Also used as backbone for more recent methods such as multiple imputation
  • We have run comparison analyses with other R packages, namely nlme, glmmTMB and lme4 as well as SAS PROC GLIMMIX and have found that mmrm is the fastest of them and provides closest results to PROC GLIMMIX

Best practices

  • Includes version control, git workflow, code review, unit/integration testing, continuous integration/delivery (ci/cd), reproducibility, change log, documentation, package design, user experience, maintainability, publication, etc.
  • Workshop “Good Software Engineering Practice for R Packages” on world tour
    • Basel, Shanghai, San José, Rockville (last Tuesday), Montreal 🌎
  • Start of video series “Statistical Software Engineering 101” 🌎
    • currently 2 videos, hopefully we can still produce more content

Conference contributions and Publications

  • Dedicated sessions with discussions at ISCB, CEN, ASA/FDA workshop (now)
  • Presentations at PSI, JSM, Pharma RUG, BBS, etc. 🌎
  • BIOP Report 🌎
  • Blog 🌎

Ingredients for successful and sustainable collaboration

Human factors

  • Mutual interest and mutual trust
  • Prerequisite is getting to know each other
    • Although mostly just online, biweekly calls help a lot with this
  • Reciprocity mindset
    • “Reciprocity means that in response to friendly actions, people are frequently much nicer and much more cooperative than predicted by the self-interest model”
    • Personal experience: If you first give away something, more will come back to you.

Development process

  • Important to go public as soon as possible
    • don’t wait for the product to be finished
    • you never know who else might be interested/could help
  • Version control with git
    • cornerstone of effective collaboration
  • Building software together works better than alone
    • Different perspectives in discussions and code review help to optimize the user interface and thus experience

Coding standards

  • Consistent and readable code style simplifies joint work
  • Written (!) contribution guidelines help
  • Lowering the entry hurdle using developer calls is important

Robust test suite

  • Unit and integration tests are essential for preventing regression and assuring quality
  • Especially with compiled code critical to see if package works correctly
  • Use continuous integration during development to make sure nothing breaks along the way

Documentation

  • Lots of work but extremely important
    • start with writing up the methods details
    • think about the code structure first in a “design doc”
    • only then put the code in the package
  • Needs to be kept up-to-date
  • Need to have examples & vignettes
    • Testing alone is not sufficient
    • Builds trust with users
    • Reference for developers over time

Long term vision

Vision: Statisticians have software engineering skills

  • These skills are taught at university
  • Statisticians can use basic practices in their daily work …
  • … to ensure reproducibility of statistical analyses and research results

Vision: Innovation does not stop with publication

  • Methods research does not end when the first methods paper is published
  • Initial prototype code as paper supplement is not sufficient
  • Continue to developing open source and reliably tested software packages …
  • … to enable users to easily use the new methodology in their own applications

Vision: Industry develops common code base

  • Increasingly companies work in the open source as much as possible
  • Rather than repeating similar developments internally …
  • … to become more cost-effective and transparent towards society

Next steps

R Packages

  • New workstream on covariate adjustment is starting up
  • Think more strategically about identifying gaps in the statistical software landscape
  • Help maintaining the CRAN Task View on Clinical Trials

Branding and Collaboration

  • Publicize our new short name openstatsware
  • Proposed to associate also with EFSPI to emphasize global nature of the group
  • Ensure a strong connection to the new pan-pharma methodology group

Communication and Outreach

  • Add more content to our video series
  • Start a chat channel to start informal discussions within larger community
  • Organize hackathons working together on workstream packages e.g.

Q&A