R Package Development

Standards for reusable R packages

Paper Repos vs Package Repos

In our lab, we maintain a clear separation between:

Paper repositories - Analysis code for specific manuscripts
Package repositories - Reusable R packages with exported functions

When to Create a Package

Create a separate R package when:

Functions will be reused across multiple projects
Code implements novel methodology
Other researchers may want to use your methods
Functions need formal documentation and testing

When to Keep Code in Paper Repo

Keep code in the paper repository when:

Analysis is project-specific
Code won’t be reused elsewhere
You’re doing exploratory analysis

Package Structure

Following the BATON package as a reference:

MyPackage/
├── DESCRIPTION          # Package metadata and dependencies
├── NAMESPACE            # Exports (auto-generated by roxygen2)
├── LICENSE              # License file
├── R/                   # R source files
│   ├── core_functions.R
│   ├── helpers.R
│   └── zzz.R            # Package hooks
├── man/                 # Documentation (auto-generated)
├── tests/
│   └── testthat/
│       ├── testthat.R
│       └── test-functions.R
├── vignettes/           # Long-form documentation
├── inst/                # Additional files
├── README.md
├── CLAUDE.md            # AI assistant guidance
├── .Rbuildignore
└── .gitignore

DESCRIPTION File

Package: MyPackage
Type: Package
Title: Short Title for the Package
Version: 0.1.0
Authors@R:
    person(given = "First", family = "Last",
           role = c("aut", "cre"),
           email = "email@unc.edu",
           comment = c(affiliation = "UNC Chapel Hill"))
Description: One paragraph describing what the package does.
License: MIT + file LICENSE
Depends: R (>= 4.2)
Imports:
    data.table,
    stats,
    utils
Suggests:
    testthat,
    knitr,
    rmarkdown
VignetteBuilder: knitr
URL: https://github.com/rashidlab/MyPackage
BugReports: https://github.com/rashidlab/MyPackage/issues
Encoding: UTF-8
LazyData: true
RoxygenNote: 7.3.0

Function Documentation

Use roxygen2 comments above every exported function:

#' Calculate constrained optimization result
#'
#' @description
#' Performs constrained Bayesian optimization over the specified
#' parameter bounds.
#'
#' @param sim_fun Simulation function that takes parameter vector
#' @param bounds Named list of parameter bounds (min, max)
#' @param constraints List of constraint specifications
#' @param n_iter Number of optimization iterations
#' @param seed Random seed for reproducibility
#'
#' @return A list with components:
#' \describe{
#'   \item{best_params}{Optimal parameter values}
#'   \item{best_value}{Objective value at optimum}
#'   \item{history}{Data frame of all evaluations}
#' }
#'
#' @examples
#' \dontrun{
#' result <- bo_calibrate(
#'   sim_fun = my_simulator,
#'   bounds = list(theta = c(0, 1)),
#'   constraints = list(power = 0.80)
#' )
#' }
#'
#' @export
bo_calibrate <- function(sim_fun, bounds, constraints,
                         n_iter = 100, seed = NULL) {
  # Implementation
}

Testing

Create tests in tests/testthat/:

# tests/testthat/test-core.R

test_that("bo_calibrate returns expected structure", {
  result <- bo_calibrate(
    sim_fun = simple_sim,
    bounds = list(x = c(0, 1)),
    n_iter = 5
  )

  expect_type(result, "list")
  expect_named(result, c("best_params", "best_value", "history"))
  expect_s3_class(result$history, "data.frame")
})

test_that("bo_calibrate respects constraints", {
  result <- bo_calibrate(
    sim_fun = constrained_sim,
    bounds = list(x = c(0, 1)),
    constraints = list(g1 = 0),
    n_iter = 10
  )

  expect_true(result$best_value >= 0)
})

Run tests:

devtools::test()

# Or specific test file
testthat::test_file("tests/testthat/test-core.R")

Development Workflow

# Initial setup
usethis::create_package("MyPackage")
usethis::use_testthat()
usethis::use_mit_license()

# During development
devtools::load_all()      # Load package for testing
devtools::document()      # Generate documentation
devtools::test()          # Run tests
devtools::check()         # Full R CMD check

# Building
devtools::build()         # Build tarball
devtools::install()       # Install locally

Linking Paper to Package

In your paper repository, install the package from GitHub:

# In paper repo's setup
devtools::install_github("naimurashid/BATON")

# Or from local source during development
devtools::install("~/Downloads/BATON")

Document the package version in your paper repo’s README:

## Dependencies

This analysis requires:
- BATON >= 0.3.0 (`devtools::install_github("naimurashid/BATON@v0.3.0")`)

README Structure for Packages

A good package README should help users install and use your package quickly:

# PackageName

<!-- badges: start -->
[![R-CMD-check](https://github.com/rashidlab/PackageName/workflows/R-CMD-check/badge.svg)](https://github.com/rashidlab/PackageName/actions)
<!-- badges: end -->

Brief one-line description of what the package does.

## Installation

```r
# From GitHub
devtools::install_github("rashidlab/PackageName")

# From local source (during development)
devtools::install("path/to/PackageName")
```

## Quick Start

```r
library(PackageName)

# Basic usage example
result <- main_function(data, param1 = value)
```

## Main Functions

| Function | Purpose |
|----------|---------|
| `main_function()` | Primary analysis function |
| `helper_function()` | Supporting utility |
| `plot_results()` | Visualization |

## Example

```r
# Load example data
data <- example_dataset

# Run analysis
result <- main_function(data)

# Visualize
plot_results(result)
```

## Documentation

See the package vignettes for detailed tutorials:

```r
vignette("getting-started", package = "PackageName")
```

## Citation

If you use this package, please cite:

```
Author et al. (2024). "Paper Title." Journal Name.
```

## License

MIT

Version Control

Git Tags for Releases

# Tag a release
git tag -a v0.3.0 -m "Release version 0.3.0"
git push origin v0.3.0

CRAN Submission Checklist

Before CRAN submission:

devtools::check() passes with 0 errors, 0 warnings, 0 notes
All examples run without error
Vignettes build successfully
R CMD check --as-cran passes
Test coverage is adequate

Best Practices for R Package Development

Code Organization

R/
├── core.R            # Main exported functions
├── internal.R        # Internal helper functions (not exported)
├── utils.R           # Small utility functions
├── validators.R      # Input validation functions
└── zzz.R             # .onLoad, .onAttach hooks

Guidelines:

One main function per file for large functions
Group related small functions together
Prefix internal functions with . or don’t export them
Keep files under ~500 lines

Naming Conventions

# Exported functions: verb_noun
calculate_power <- function(...) { }
fit_model <- function(...) { }
plot_results <- function(...) { }

# Internal functions: .prefix or no export
.validate_inputs <- function(...) { }
check_bounds <- function(...) { }  # Not in NAMESPACE

# S3 methods follow conventions
print.my_class <- function(x, ...) { }
summary.my_class <- function(object, ...) { }

Error Handling

#' Fit model with proper error handling
fit_model <- function(data, formula, ...) {
  # Validate inputs early
 if (!is.data.frame(data)) {
    stop("'data' must be a data frame", call. = FALSE)
  }

  if (nrow(data) == 0) {
    stop("'data' cannot be empty", call. = FALSE)
  }

  # Warnings for non-fatal issues
  if (any(is.na(data$outcome))) {
    warning("NA values in outcome will be excluded", call. = FALSE)
  }

  # Use tryCatch for operations that might fail
  result <- tryCatch(
    {
      # Model fitting code
    },
    error = function(e) {
      stop(sprintf("Model fitting failed: %s", e$message), call. = FALSE)
    }
  )

  result
}

Dependencies

DESCRIPTION best practices:

Imports:
    data.table,    # Essential functionality
    stats,         # Base R packages are fine
    utils
Suggests:
    testthat,      # Testing only
    knitr,         # Vignettes only
    ggplot2        # Optional visualization

Minimize dependencies
Use Imports: for required packages
Use Suggests: for optional/testing packages
Avoid depending on packages that depend on many others
Consider using base R where possible

Documentation Standards

Every exported function needs:

Title - One line description
Description - Detailed explanation
Parameters - All arguments documented
Return value - What the function returns
Examples - Runnable code (use \dontrun{} for slow examples)

#' Calculate constrained optimization result
#'
#' Performs constrained Bayesian optimization over the specified
#' parameter bounds to find designs satisfying power and type I
#' error constraints.
#'
#' @param sim_fun Function that simulates trial outcomes. Must accept
#'   a named numeric vector of parameters and return a list with
#'   components `power` and `type1`.
#' @param bounds Named list where each element is a length-2 numeric
#'   vector giving (min, max) bounds for that parameter.
#' @param constraints Named list of constraint values. Currently
#'   supports `power` (minimum) and `type1` (maximum).
#' @param n_iter Integer. Number of optimization iterations.
#'   Default is 100.
#' @param seed Integer or NULL. Random seed for reproducibility.
#'
#' @return A list with class "bo_result" containing:
#' \describe{
#'   \item{best_params}{Named numeric vector of optimal parameters}
#'   \item{best_value}{Numeric. Objective value at optimum}
#'   \item{constraint_values}{Named numeric. Constraint values at optimum}
#'   \item{history}{Data frame of all evaluations}
#' }
#'
#' @examples
#' # Define a simple simulator
#' my_sim <- function(params) {
#'   list(power = 0.8 + params["x"] * 0.1,
#'        type1 = 0.05 - params["x"] * 0.02)
#' }
#'
#' # Run optimization
#' result <- bo_calibrate(
#'   sim_fun = my_sim,
#'   bounds = list(x = c(0, 1)),
#'   constraints = list(power = 0.80, type1 = 0.05),
#'   n_iter = 10
#' )
#'
#' @seealso [extract_best()] for extracting optimal parameters
#' @export

Testing Best Practices

# tests/testthat/test-core.R

# Group related tests
test_that("bo_calibrate validates inputs", {
  expect_error(bo_calibrate(NULL), "'sim_fun' must be a function")
  expect_error(bo_calibrate(identity, bounds = "x"),
               "'bounds' must be a list")
})

test_that("bo_calibrate returns expected structure", {
  result <- bo_calibrate(simple_sim, bounds = list(x = c(0, 1)))

  expect_s3_class(result, "bo_result")
  expect_named(result, c("best_params", "best_value", "history"))
})

test_that("bo_calibrate respects constraints", {
  result <- bo_calibrate(
    sim_fun = constrained_sim,
    bounds = list(x = c(0, 1)),
    constraints = list(power = 0.80),
    n_iter = 20
  )

  expect_gte(result$constraint_values["power"], 0.80)
})

# Test edge cases
test_that("bo_calibrate handles edge cases", {
  # Empty result
  expect_error(bo_calibrate(failing_sim, bounds = list(x = c(0, 1))))

  # Single parameter
  result <- bo_calibrate(simple_sim, bounds = list(x = c(0.5, 0.5)))
  expect_equal(result$best_params["x"], 0.5)
})

Performance Considerations

# Vectorize when possible
# BAD:
compute_values <- function(x) {
  result <- numeric(length(x))
  for (i in seq_along(x)) {
    result[i] <- expensive_operation(x[i])
  }
  result
}

# GOOD:
compute_values <- function(x) {
  expensive_operation_vectorized(x)
}

# Use data.table for large data operations
# GOOD:
process_large_data <- function(dt) {
  dt[, .(mean_val = mean(value)), by = group]
}

# Pre-allocate in loops
# GOOD:
results <- vector("list", n_iterations)
for (i in seq_len(n_iterations)) {
  results[[i]] <- compute(i)
}

For computationally intensive code, consider Rcpp integration below.

Rcpp Integration

When R code is too slow even after vectorization, Rcpp lets you write performance-critical sections in C++.

When to Use Rcpp

Scenario	Use Rcpp?
Loops that can’t be vectorized	Yes
Recursive algorithms	Yes
Operations on millions of elements	Consider
Code already fast enough	No
One-time analysis script	Probably no

Rule of thumb: Profile first. Only optimize bottlenecks that matter.

Setup

# Install Rcpp
install.packages("Rcpp")

# Add to your package
usethis::use_rcpp()

This creates: - src/ directory for C++ files - Updates DESCRIPTION with LinkingTo: Rcpp - Adds @useDynLib and @importFrom Rcpp sourceCpp to package documentation

Package Structure with Rcpp

MyPackage/
├── DESCRIPTION
├── NAMESPACE
├── R/
│   ├── RcppExports.R    # Auto-generated
│   └── wrapper.R        # R wrappers for C++ functions
├── src/
│   ├── RcppExports.cpp  # Auto-generated
│   ├── fast_functions.cpp
│   └── Makevars         # Compiler flags (optional)
└── ...

Basic Rcpp Example

src/fast_functions.cpp:

#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
NumericVector cumsum_cpp(NumericVector x) {


  int n = x.size();
  NumericVector result(n);

  result[0] = x[0];
  for (int i = 1; i < n; i++) {
    result[i] = result[i-1] + x[i];
  }

  return result;
}

// [[Rcpp::export]]
double weighted_mean_cpp(NumericVector x, NumericVector w) {

  int n = x.size();
  double sum_xw = 0.0;
  double sum_w = 0.0;

  for (int i = 0; i < n; i++) {
    sum_xw += x[i] * w[i];
    sum_w += w[i];
  }

  return sum_xw / sum_w;
}

After running devtools::document(), these functions become available in R:

library(MyPackage)
cumsum_cpp(1:10)
weighted_mean_cpp(c(1, 2, 3), c(0.5, 0.3, 0.2))

Simulation-Specific Example

For adaptive trial simulations with many iterations:

#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
List simulate_trial_cpp(int n_patients,
                        double true_rate,
                        double threshold,
                        int n_sims) {

  NumericVector successes(n_sims);
  NumericVector stopped_early(n_sims);

  for (int sim = 0; sim < n_sims; sim++) {
    int n_success = 0;
    bool stopped = false;

    for (int i = 0; i < n_patients; i++) {
      // Simulate patient outcome
      if (R::runif(0, 1) < true_rate) {
        n_success++;
      }

      // Interim analysis (example: futility check)
      if (i == n_patients / 2) {
        double current_rate = (double)n_success / (i + 1);
        if (current_rate < threshold * 0.5) {
          stopped = true;
          break;
        }
      }
    }

    successes[sim] = n_success;
    stopped_early[sim] = stopped ? 1.0 : 0.0;
  }

  return List::create(
    Named("successes") = successes,
    Named("stopped_early") = stopped_early
  );
}

R Wrapper Functions

Create user-friendly R wrappers with input validation:

R/simulate.R:

#' Simulate adaptive trial
#'
#' @param n_patients Total patients to enroll
#' @param true_rate True response rate
#' @param threshold Efficacy threshold
#' @param n_sims Number of simulations
#' @return List with simulation results
#' @export
simulate_trial <- function(n_patients, true_rate, threshold, n_sims = 10000) {
  # Validate inputs in R (easier error messages)
  stopifnot(
    is.numeric(n_patients), n_patients > 0,
    is.numeric(true_rate), true_rate >= 0, true_rate <= 1,
    is.numeric(threshold), threshold >= 0, threshold <= 1,
    is.numeric(n_sims), n_sims > 0
  )


  # Call C++ implementation
  result <- simulate_trial_cpp(
    as.integer(n_patients),
    as.double(true_rate),
    as.double(threshold),
    as.integer(n_sims)
  )

  # Post-process in R
  result$power <- mean(result$successes >= threshold * n_patients)
  result$early_stop_rate <- mean(result$stopped_early)

  class(result) <- "trial_sim"
  result
}

RcppArmadillo for Linear Algebra

For matrix operations, use RcppArmadillo:

usethis::use_rcpp_armadillo()

// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>
using namespace Rcpp;
using namespace arma;

// [[Rcpp::export]]
arma::mat fast_crossprod(arma::mat X) {
  return X.t() * X;
}

// [[Rcpp::export]]
arma::vec solve_system(arma::mat A, arma::vec b) {
  return arma::solve(A, b);
}

DESCRIPTION for Rcpp Packages

Package: MyPackage
Type: Package
Title: Fast Simulation Tools
Version: 0.1.0
Authors@R: person("First", "Last", email = "email@unc.edu", role = c("aut", "cre"))
Description: Efficient simulation tools using Rcpp.
License: MIT + file LICENSE
Imports:
    Rcpp (>= 1.0.0)
LinkingTo:
    Rcpp,
    RcppArmadillo
SystemRequirements: C++11
Encoding: UTF-8
RoxygenNote: 7.3.0

Compilation on Longleaf

On Longleaf, load the appropriate compiler:

module load gcc/11.2.0
module load r/4.3.0

# Then in R
devtools::install()

Debugging Rcpp Code

# Compile with debug symbols
Sys.setenv("PKG_CXXFLAGS" = "-g -O0")
devtools::clean_dll()
devtools::load_all()

# Print debugging (appears in R console)
# In C++: Rcpp::Rcout << "value: " << x << std::endl;

Common Pitfalls

Issue	Solution
Segfault	Check array bounds; R uses 1-indexing, C++ uses 0
Wrong results	Ensure integer vs double types match
Slow compilation	Split large files; use `// [[Rcpp::export]]` sparingly
Memory leaks	Rcpp handles most memory; avoid raw `new`/`delete`
RNG issues	Use `R::runif()`, `R::rnorm()` for reproducibility with `set.seed()`

Learning Resources

Rcpp for Everyone - Comprehensive guide
Rcpp Gallery - Code examples
Advanced R - Rcpp Chapter - Hadley Wickham’s introduction

Reference Package

See BATON for a complete example of:

Package structure and DESCRIPTION
Roxygen2 documentation style
Test organization
Vignette setup
GitHub Actions for CI (optional)