Introducting the Software Engineering Working Group and {mmrm}

Ben Arancibia and Yoni Sidi


  • Introduction and Overview of SWE WG

  • Mixed Models for Repeated Measures - Why is it a Problem?

  • MMRM Package

    • Why this is not “yet another package”

    • Long Term Perspective

    • Comparing MMRM R Package to SAS - Demo

  • Closing and Next Steps

Who we are

  • Ben Arancibia: Director of Data Science at GSK sitting within Statistical Data Sciences Innovation Hub
  • Yoni Sidi: Director of Modeling and Simulation at Sage Therapeutics

Software Engineering in Biostatistics

  • Open-source software has gained increasing popularity in Biostatistics over the last two decades
    • Pros: rapid uptake of novel statistical methods and unprecedented opportunities for collaboration and innovation
    • Cons: users face huge variability in software quality, in particular reliability, efficiency and maintainability
  • Developing high-quality software with good coding practices, reproducible outputs, and self-sufficient documentation is critical to inform clinical and regulatory decisions

How to deal with these issues?

  • To deal with the issues of statistical quality assurances for R packages and creating high quality statistical software a group came together to create through the “Software Engineering Working Group” (SWE WG) to look R packages from a statistical point of view

Software Engineering working group (SWE WG)

  • An official working group of the ASA Biopharmaceutical Section
  • Formed in August 2022
  • Cross-industry collaboration with more than 30 members from over 20 organizations
  • Home page at


  • Primary Goal: Collaborate to engineer R packages that implement important statistical methods to fill in critical gaps

  • Secondary Goal: Develop and disseminate best practices for engineering high-quality open-source statistical software

SWE WG Activities

  • First R package mmrm was published on CRAN in October 2022 and updated in December
    • We aim to estabilish this package as a new standard for fitting mixed models for repeated measures (MMRM)
    • We have been developing and adopting best practices for software in the mmrm package, and open sourced it at
    • Currently under active development to add more features

Why do we need a package for MMRM?

  • Mixed Models for Repeated Measures (MMRM) is a popular choice for analyzing longitudinal continuous outcomes in randomized clinical trials
  • No great R Package - initially thought that the MMRM problem was solved by using a combination of lme4 and lmerTest
  • Learned that this approach failed on large data sets (slow, did not converge)
  • nlme does not give Satterthwaite adjusted degrees of freedom, has convergence issues, and with emmeans it is only approximate
  • tried to extend glmmTMB to calculate Satterthwaite adjusted degrees of freedom

Before creating a new package

  • First try to improve existing package
    • Here we tried to extend glmmTMB to calculate Satterthwaite adjusted degrees of freedom
    • But it did not work
  • Think about long term maintenance and responsibility

Idea with some Details

  • Because glmmTMB is always using a random effects representation, we cannot have a real unstructured model (uses \(\sigma = \varepsilon > 0\) trick)
  • We only want to fit a fixed effects model with a structured covariance matrix for each subject
  • The idea is then to use the Template Model Builder (TMB) directly - as it is also underlying glmmTMB - but code the exact model we want
  • We do this by implementing the log-likelihood in C++ using the TMB provided libraries

Advantages of TMB

  • Fast C++ framework for defining objective functions (Rcpp would have been alternative interface)
  • Automatic differentiation of the log-likelihood as a function of the variance parameters
  • We get the gradient and Hessian exactly and without additional coding
  • This can be used from the R side with the TMB interface and plugged into optimizers

Why it’s not just another package

  • Ongoing maintenance and support from the pharmaceutical industry that is supported by American Statistical Association
  • Package is part of the mission, but to emphasize our goal is to push out information on practices for engineering high-quality open-source statistical software

Comparing SAS and R

To run an MMRM model in SAS it is recommended to use either the PROC MIXED or PROC GLM procedures.

  • Less model assumptions are applied in PROC MIXED, primarily how one treats missingness.

  • We will compares the PROC MIXED procedure to the {mmrm} package in the following attributes:

  • Documentation
  • Unit Testing
  • Estimation Methods
  • Covariance structures
  • Degrees of Freedom
  • Contrasts


Both languages have online documentation of the technical details of the estimation and degrees of freedom methods and the different covariance structures available.



Unit Testing

One major advantage of the {mmrm} over PROC MIXED is that the unit testing in {mmrm} is transparent. It uses the {testthat} framework with {covr} to communicate the testing coverage. Unit tests can be found in the GitHub repository under ./tests.


The integration tests in {mmrm} are set to a tolerance of 10e^-3 when compared to SAS outputs.

Estimation Methods

Method {mmrm} PROC MIXED

Covariance structures

  • SAS has 23 non-spatial covariance structures, while mmrm has 10.
    • 9 structures intersect with SAS
    • Ante-dependence (homogeneous) is only in the mmrm package.
  • SAS has 14 spatial covariance structures compared to the spatial exponential one available in mmrm.


For users that need more structure {mmrm} is easily extensible via feature requests in the GitHub repository.

Covariance structures details

Covariance structures {mmrm} PROC MIXED
Unstructured (Unweighted/Weighted) X/X X/X
Toeplitz (hetero/homo) X/X X/X
Compound symmetry (hetero/homo) X/X X/X
Auto-regressive (hetero/homo) X/X X/X
Ante-dependence (hetero/homo) X/X X
Spatial exponential X X

Degrees of Freedom Methods

Method {mmrm} PROC MIXED
Contain X* X
Between/Within X* X
Residual X* X
Satterthwaite X X
Kenward-Roger X X
Kenward-Roger (Linear)** X X

*Available through the emmeans package.

**This is not equivalent to the KR2 setting in PROC MIXED


Contrasts and LSMeans estimates are available in mmrm using

  • mmrm::df_1d, mmrm::df_md
  • S3 method that is compatible with emmeans
  • LS means difference can be produced through emmeans (pairs method)
  • Degrees of freedom method is passed from mmrm to emmeans
  • By default PROC MIXED and mmrm do not adjust for multiplicity, whereas emmeans does.


These are comparable to the LSMEANS statement in PROC MIXED.

SWE WG Long Term Perspective

  • Software engineering is a critical competence in producing high-quality statistical software
  • A lot of work needs to be done regarding the establishment, dissemination and adoption of best practices for engineering open-source software
  • Improving the way software engineering is done will help improve the efficiency, reliability and innovation within Biostatistics

What’s next with {mmrm} & SWEWG

  • Prepare public training materials to disseminate best practice for software engineering in the Biostatistics community
    • At the beginning of February, a face-to-face workshop will take place in Basel, Switzerland with a focus on open-source software for clinical trials
    • Organize conference sessions with a focus on statistical software engineering at CEN, JSM and ASA/FDA Workshop
    • Video series on best practices for software engineering (link)

New packages SWE WG is Working On

  • sasr
  • HTA
  • Bayesian MMRM


Have an interest in working on these topics? Come work with us, information on the SWE WG can be found here: ASA BIOP SWE WG

Thank you! Questions?