Introducing the Software Engineering working group

Who we are and what we build together

Daniel Sabanes Bove, Ya Wang


What is the SWE WG?

ASA BIOP Software Engineering working group (SWE WG)

  • An official working group of the ASA Biopharmaceutical Section
  • Formed in August 2022
  • Cross-industry collaboration with more than 30 members from over 20 organizations
  • Home page at
  • Open for new members!


  • Importance of reliable software for statistical analysis cannot be underestimated
  • In the past a lot of statistical analyses have been performed with proprietary software
  • Open-source software has gained increasing popularity in Biostatistics over the last two decades
    • Pros: rapid uptake of novel methods and great opportunities for collaboration and innovation
    • Cons: users face huge variability in software quality (reliability, efficiency and maintainability)

Motivation (con’t)

  • Current repositories (GitHub, CRAN):
    • Do not require any statistical quality assurance
    • It’s harder to adopt packages without good documentation and maintenance
  • Developing high-quality software is critical to inform clinical and regulatory decisions:
    • Good coding practices
    • Reproducible outputs
    • Self-sufficient documentation


  • Primary Goal: Collaborate to engineer R packages that implement important statistical methods to fill in critical gaps

  • Secondary Goal: Develop and disseminate best practices for engineering high-quality open-source statistical software

SWE WG Activities

  • First R package mmrm was published on CRAN in October 2022 and updated in December
    • We aim to establish this package as a new standard for fitting mixed models for repeated measures (MMRM)
    • We have been developing and adopting best practices for software in the mmrm package, and open sourced it at
    • Currently under active development to add more features

SWE WG Activities (con’t)

  • Prepare public training materials to disseminate best practice for software engineering in the Biostatistics community
    • At the beginning of February, a face-to-face workshop will take place in Basel, Switzerland with a focus on open-source software for clinical trials
    • Organize conference sessions with a focus on statistical software engineering at CEN, JSM and ASA/FDA Workshop
    • Youtube series with a focus on best practices for software engineering

Best Practices for Software Engineering

  • User interface design
  • Version control with git
  • Unit and integration tests
  • Consistent and readable code style
  • Documentation
  • Continuous integration setup
  • Publication on website and repositories
  • etc.

mmrm package example


  • Mixed Models for Repeated Measures (MMRM) is a popular choice for analyzing longitudinal continuous outcomes in randomized clinical trials
  • No great R Package - initially thought that it was solved by using lme4 with lmerTest
  • But this approach failed on large data (slow, did not converge)
  • nlme does not give Satterthwaite adjusted degrees of freedom, has convergence issues, and with emmeans it is only approximate

Before creating a new package

  • First try to improve existing package
    • Here we tried to extend glmmTMB to calculate Satterthwaite adjusted degrees of freedom
    • But it did not work
  • Think about long term maintenance and responsibility

mmrm Package overview

  • Linear model for dependent observations within independent subjects
  • Multiple covariance structures available for the dependent observations
  • REML or ML estimation, using multiple optimizers if needed
  • emmeans interface for least square means, tidymodels for easy model fitting
  • Satterthwaite and Kenward-Roger adjustments

Why it’s not just another package

  • Ongoing maintenance and support from the pharmaceutical industry
    • 5 companies being involved in the development, on track to become standard package
  • Development using best practices as show case for high quality package
    • Thorough unit and integration tests (also comparing with SAS results) to ensure accurate results

Covariance structures

Covariance structures {mmrm} PROC MIXED
Unstructured (Unweighted/Weighted) X/X X/X
Toeplitz (hetero/homo) X/X X/X
Compound symmetry (hetero/homo) X/X X/X
Auto-regressive (hetero/homo) X/X X/X
Ante-dependence (hetero/homo) X/X X
Spatial exponential X X

Degrees of Freedom Methods

Method {mmrm} PROC MIXED
Contain X* X
Between/Within X* X
Residual X* X
Satterthwaite X X
Kenward-Roger X X
Kenward-Roger (Linear)** X X

*Available through the emmeans package.

**This is not equivalent to the KR2 setting in PROC MIXED


Contrasts and least square means estimates are available in mmrm using:

  • df_1d, df_md
  • S3 method that is compatible with emmeans
  • LS means difference can be produced through emmeans (pairs method)
  • Degrees of freedom method is passed from mmrm to emmeans
  • By default PROC MIXED and mmrm do not adjust for multiplicity, whereas emmeans does.


Closing and Next Steps

Long term perspective

  • Software engineering is a critical competence in producing high-quality statistical software
  • A lot of work needs to be done regarding the establishment, dissemination and adoption of best practices for engineering open-source software
  • Improving the way software engineering is done will help improve the efficiency, reliability and innovation within Biostatistics

New packages coming up

  • sasr
  • HTA packages
  • Bayesian MMRM

Thank you! Questions?