R Package Development
Standards for reusable R packages
Paper Repos vs Package Repos
In our lab, we maintain a clear separation between:
- Paper repositories - Analysis code for specific manuscripts
- Package repositories - Reusable R packages with exported functions
When to Create a Package
Create a separate R package when:
- Functions will be reused across multiple projects
- Code implements novel methodology
- Other researchers may want to use your methods
- Functions need formal documentation and testing
When to Keep Code in Paper Repo
Keep code in the paper repository when:
- Analysis is project-specific
- Code won’t be reused elsewhere
- You’re doing exploratory analysis
Package Structure
Following the BATON package as a reference:
MyPackage/
├── DESCRIPTION # Package metadata and dependencies
├── NAMESPACE # Exports (auto-generated by roxygen2)
├── LICENSE # License file
├── R/ # R source files
│ ├── core_functions.R
│ ├── helpers.R
│ └── zzz.R # Package hooks
├── man/ # Documentation (auto-generated)
├── tests/
│ └── testthat/
│ ├── testthat.R
│ └── test-functions.R
├── vignettes/ # Long-form documentation
├── inst/ # Additional files
├── README.md
├── CLAUDE.md # AI assistant guidance
├── .Rbuildignore
└── .gitignore
DESCRIPTION File
Package: MyPackage
Type: Package
Title: Short Title for the Package
Version: 0.1.0
Authors@R:
person(given = "First", family = "Last",
role = c("aut", "cre"),
email = "email@unc.edu",
comment = c(affiliation = "UNC Chapel Hill"))
Description: One paragraph describing what the package does.
License: MIT + file LICENSE
Depends: R (>= 4.2)
Imports:
data.table,
stats,
utils
Suggests:
testthat,
knitr,
rmarkdown
VignetteBuilder: knitr
URL: https://github.com/rashidlab/MyPackage
BugReports: https://github.com/rashidlab/MyPackage/issues
Encoding: UTF-8
LazyData: true
RoxygenNote: 7.3.0
Function Documentation
Use roxygen2 comments above every exported function:
#' Calculate constrained optimization result
#'
#' @description
#' Performs constrained Bayesian optimization over the specified
#' parameter bounds.
#'
#' @param sim_fun Simulation function that takes parameter vector
#' @param bounds Named list of parameter bounds (min, max)
#' @param constraints List of constraint specifications
#' @param n_iter Number of optimization iterations
#' @param seed Random seed for reproducibility
#'
#' @return A list with components:
#' \describe{
#' \item{best_params}{Optimal parameter values}
#' \item{best_value}{Objective value at optimum}
#' \item{history}{Data frame of all evaluations}
#' }
#'
#' @examples
#' \dontrun{
#' result <- bo_calibrate(
#' sim_fun = my_simulator,
#' bounds = list(theta = c(0, 1)),
#' constraints = list(power = 0.80)
#' )
#' }
#'
#' @export
bo_calibrate <- function(sim_fun, bounds, constraints,
n_iter = 100, seed = NULL) {
# Implementation
}Testing
Create tests in tests/testthat/:
# tests/testthat/test-core.R
test_that("bo_calibrate returns expected structure", {
result <- bo_calibrate(
sim_fun = simple_sim,
bounds = list(x = c(0, 1)),
n_iter = 5
)
expect_type(result, "list")
expect_named(result, c("best_params", "best_value", "history"))
expect_s3_class(result$history, "data.frame")
})
test_that("bo_calibrate respects constraints", {
result <- bo_calibrate(
sim_fun = constrained_sim,
bounds = list(x = c(0, 1)),
constraints = list(g1 = 0),
n_iter = 10
)
expect_true(result$best_value >= 0)
})Run tests:
devtools::test()
# Or specific test file
testthat::test_file("tests/testthat/test-core.R")Development Workflow
# Initial setup
usethis::create_package("MyPackage")
usethis::use_testthat()
usethis::use_mit_license()
# During development
devtools::load_all() # Load package for testing
devtools::document() # Generate documentation
devtools::test() # Run tests
devtools::check() # Full R CMD check
# Building
devtools::build() # Build tarball
devtools::install() # Install locallyLinking Paper to Package
In your paper repository, install the package from GitHub:
# In paper repo's setup
devtools::install_github("naimurashid/BATON")
# Or from local source during development
devtools::install("~/Downloads/BATON")Document the package version in your paper repo’s README:
## Dependencies
This analysis requires:
- BATON >= 0.3.0 (`devtools::install_github("naimurashid/BATON@v0.3.0")`)README Structure for Packages
A good package README should help users install and use your package quickly:
# PackageName
<!-- badges: start -->
[](https://github.com/rashidlab/PackageName/actions)
<!-- badges: end -->
Brief one-line description of what the package does.
## Installation
```r
# From GitHub
devtools::install_github("rashidlab/PackageName")
# From local source (during development)
devtools::install("path/to/PackageName")
```
## Quick Start
```r
library(PackageName)
# Basic usage example
result <- main_function(data, param1 = value)
```
## Main Functions
| Function | Purpose |
|----------|---------|
| `main_function()` | Primary analysis function |
| `helper_function()` | Supporting utility |
| `plot_results()` | Visualization |
## Example
```r
# Load example data
data <- example_dataset
# Run analysis
result <- main_function(data)
# Visualize
plot_results(result)
```
## Documentation
See the package vignettes for detailed tutorials:
```r
vignette("getting-started", package = "PackageName")
```
## Citation
If you use this package, please cite:
```
Author et al. (2024). "Paper Title." Journal Name.
```
## License
MITVersion Control
CRAN Submission Checklist
Before CRAN submission:
devtools::check()passes with 0 errors, 0 warnings, 0 notes- All examples run without error
- Vignettes build successfully
R CMD check --as-cranpasses- Test coverage is adequate
Best Practices for R Package Development
Code Organization
R/
├── core.R # Main exported functions
├── internal.R # Internal helper functions (not exported)
├── utils.R # Small utility functions
├── validators.R # Input validation functions
└── zzz.R # .onLoad, .onAttach hooks
Guidelines:
- One main function per file for large functions
- Group related small functions together
- Prefix internal functions with
.or don’t export them - Keep files under ~500 lines
Naming Conventions
# Exported functions: verb_noun
calculate_power <- function(...) { }
fit_model <- function(...) { }
plot_results <- function(...) { }
# Internal functions: .prefix or no export
.validate_inputs <- function(...) { }
check_bounds <- function(...) { } # Not in NAMESPACE
# S3 methods follow conventions
print.my_class <- function(x, ...) { }
summary.my_class <- function(object, ...) { }Error Handling
#' Fit model with proper error handling
fit_model <- function(data, formula, ...) {
# Validate inputs early
if (!is.data.frame(data)) {
stop("'data' must be a data frame", call. = FALSE)
}
if (nrow(data) == 0) {
stop("'data' cannot be empty", call. = FALSE)
}
# Warnings for non-fatal issues
if (any(is.na(data$outcome))) {
warning("NA values in outcome will be excluded", call. = FALSE)
}
# Use tryCatch for operations that might fail
result <- tryCatch(
{
# Model fitting code
},
error = function(e) {
stop(sprintf("Model fitting failed: %s", e$message), call. = FALSE)
}
)
result
}Dependencies
DESCRIPTION best practices:
Imports:
data.table, # Essential functionality
stats, # Base R packages are fine
utils
Suggests:
testthat, # Testing only
knitr, # Vignettes only
ggplot2 # Optional visualization
- Minimize dependencies
- Use
Imports:for required packages - Use
Suggests:for optional/testing packages - Avoid depending on packages that depend on many others
- Consider using base R where possible
Documentation Standards
Every exported function needs:
- Title - One line description
- Description - Detailed explanation
- Parameters - All arguments documented
- Return value - What the function returns
- Examples - Runnable code (use
\dontrun{}for slow examples)
#' Calculate constrained optimization result
#'
#' Performs constrained Bayesian optimization over the specified
#' parameter bounds to find designs satisfying power and type I
#' error constraints.
#'
#' @param sim_fun Function that simulates trial outcomes. Must accept
#' a named numeric vector of parameters and return a list with
#' components `power` and `type1`.
#' @param bounds Named list where each element is a length-2 numeric
#' vector giving (min, max) bounds for that parameter.
#' @param constraints Named list of constraint values. Currently
#' supports `power` (minimum) and `type1` (maximum).
#' @param n_iter Integer. Number of optimization iterations.
#' Default is 100.
#' @param seed Integer or NULL. Random seed for reproducibility.
#'
#' @return A list with class "bo_result" containing:
#' \describe{
#' \item{best_params}{Named numeric vector of optimal parameters}
#' \item{best_value}{Numeric. Objective value at optimum}
#' \item{constraint_values}{Named numeric. Constraint values at optimum}
#' \item{history}{Data frame of all evaluations}
#' }
#'
#' @examples
#' # Define a simple simulator
#' my_sim <- function(params) {
#' list(power = 0.8 + params["x"] * 0.1,
#' type1 = 0.05 - params["x"] * 0.02)
#' }
#'
#' # Run optimization
#' result <- bo_calibrate(
#' sim_fun = my_sim,
#' bounds = list(x = c(0, 1)),
#' constraints = list(power = 0.80, type1 = 0.05),
#' n_iter = 10
#' )
#'
#' @seealso [extract_best()] for extracting optimal parameters
#' @exportTesting Best Practices
# tests/testthat/test-core.R
# Group related tests
test_that("bo_calibrate validates inputs", {
expect_error(bo_calibrate(NULL), "'sim_fun' must be a function")
expect_error(bo_calibrate(identity, bounds = "x"),
"'bounds' must be a list")
})
test_that("bo_calibrate returns expected structure", {
result <- bo_calibrate(simple_sim, bounds = list(x = c(0, 1)))
expect_s3_class(result, "bo_result")
expect_named(result, c("best_params", "best_value", "history"))
})
test_that("bo_calibrate respects constraints", {
result <- bo_calibrate(
sim_fun = constrained_sim,
bounds = list(x = c(0, 1)),
constraints = list(power = 0.80),
n_iter = 20
)
expect_gte(result$constraint_values["power"], 0.80)
})
# Test edge cases
test_that("bo_calibrate handles edge cases", {
# Empty result
expect_error(bo_calibrate(failing_sim, bounds = list(x = c(0, 1))))
# Single parameter
result <- bo_calibrate(simple_sim, bounds = list(x = c(0.5, 0.5)))
expect_equal(result$best_params["x"], 0.5)
})Performance Considerations
# Vectorize when possible
# BAD:
compute_values <- function(x) {
result <- numeric(length(x))
for (i in seq_along(x)) {
result[i] <- expensive_operation(x[i])
}
result
}
# GOOD:
compute_values <- function(x) {
expensive_operation_vectorized(x)
}
# Use data.table for large data operations
# GOOD:
process_large_data <- function(dt) {
dt[, .(mean_val = mean(value)), by = group]
}
# Pre-allocate in loops
# GOOD:
results <- vector("list", n_iterations)
for (i in seq_len(n_iterations)) {
results[[i]] <- compute(i)
}For computationally intensive code, consider Rcpp integration below.
Rcpp Integration
When R code is too slow even after vectorization, Rcpp lets you write performance-critical sections in C++.
When to Use Rcpp
| Scenario | Use Rcpp? |
|---|---|
| Loops that can’t be vectorized | Yes |
| Recursive algorithms | Yes |
| Operations on millions of elements | Consider |
| Code already fast enough | No |
| One-time analysis script | Probably no |
Rule of thumb: Profile first. Only optimize bottlenecks that matter.
Setup
# Install Rcpp
install.packages("Rcpp")
# Add to your package
usethis::use_rcpp()This creates: - src/ directory for C++ files - Updates DESCRIPTION with LinkingTo: Rcpp - Adds @useDynLib and @importFrom Rcpp sourceCpp to package documentation
Package Structure with Rcpp
MyPackage/
├── DESCRIPTION
├── NAMESPACE
├── R/
│ ├── RcppExports.R # Auto-generated
│ └── wrapper.R # R wrappers for C++ functions
├── src/
│ ├── RcppExports.cpp # Auto-generated
│ ├── fast_functions.cpp
│ └── Makevars # Compiler flags (optional)
└── ...
Basic Rcpp Example
src/fast_functions.cpp:
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
NumericVector cumsum_cpp(NumericVector x) {
int n = x.size();
NumericVector result(n);
result[0] = x[0];
for (int i = 1; i < n; i++) {
result[i] = result[i-1] + x[i];
}
return result;
}
// [[Rcpp::export]]
double weighted_mean_cpp(NumericVector x, NumericVector w) {
int n = x.size();
double sum_xw = 0.0;
double sum_w = 0.0;
for (int i = 0; i < n; i++) {
sum_xw += x[i] * w[i];
sum_w += w[i];
}
return sum_xw / sum_w;
}After running devtools::document(), these functions become available in R:
library(MyPackage)
cumsum_cpp(1:10)
weighted_mean_cpp(c(1, 2, 3), c(0.5, 0.3, 0.2))Simulation-Specific Example
For adaptive trial simulations with many iterations:
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
List simulate_trial_cpp(int n_patients,
double true_rate,
double threshold,
int n_sims) {
NumericVector successes(n_sims);
NumericVector stopped_early(n_sims);
for (int sim = 0; sim < n_sims; sim++) {
int n_success = 0;
bool stopped = false;
for (int i = 0; i < n_patients; i++) {
// Simulate patient outcome
if (R::runif(0, 1) < true_rate) {
n_success++;
}
// Interim analysis (example: futility check)
if (i == n_patients / 2) {
double current_rate = (double)n_success / (i + 1);
if (current_rate < threshold * 0.5) {
stopped = true;
break;
}
}
}
successes[sim] = n_success;
stopped_early[sim] = stopped ? 1.0 : 0.0;
}
return List::create(
Named("successes") = successes,
Named("stopped_early") = stopped_early
);
}R Wrapper Functions
Create user-friendly R wrappers with input validation:
R/simulate.R:
#' Simulate adaptive trial
#'
#' @param n_patients Total patients to enroll
#' @param true_rate True response rate
#' @param threshold Efficacy threshold
#' @param n_sims Number of simulations
#' @return List with simulation results
#' @export
simulate_trial <- function(n_patients, true_rate, threshold, n_sims = 10000) {
# Validate inputs in R (easier error messages)
stopifnot(
is.numeric(n_patients), n_patients > 0,
is.numeric(true_rate), true_rate >= 0, true_rate <= 1,
is.numeric(threshold), threshold >= 0, threshold <= 1,
is.numeric(n_sims), n_sims > 0
)
# Call C++ implementation
result <- simulate_trial_cpp(
as.integer(n_patients),
as.double(true_rate),
as.double(threshold),
as.integer(n_sims)
)
# Post-process in R
result$power <- mean(result$successes >= threshold * n_patients)
result$early_stop_rate <- mean(result$stopped_early)
class(result) <- "trial_sim"
result
}RcppArmadillo for Linear Algebra
For matrix operations, use RcppArmadillo:
usethis::use_rcpp_armadillo()// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>
using namespace Rcpp;
using namespace arma;
// [[Rcpp::export]]
arma::mat fast_crossprod(arma::mat X) {
return X.t() * X;
}
// [[Rcpp::export]]
arma::vec solve_system(arma::mat A, arma::vec b) {
return arma::solve(A, b);
}DESCRIPTION for Rcpp Packages
Package: MyPackage
Type: Package
Title: Fast Simulation Tools
Version: 0.1.0
Authors@R: person("First", "Last", email = "email@unc.edu", role = c("aut", "cre"))
Description: Efficient simulation tools using Rcpp.
License: MIT + file LICENSE
Imports:
Rcpp (>= 1.0.0)
LinkingTo:
Rcpp,
RcppArmadillo
SystemRequirements: C++11
Encoding: UTF-8
RoxygenNote: 7.3.0
Compilation on Longleaf
On Longleaf, load the appropriate compiler:
module load gcc/11.2.0
module load r/4.3.0
# Then in R
devtools::install()Debugging Rcpp Code
# Compile with debug symbols
Sys.setenv("PKG_CXXFLAGS" = "-g -O0")
devtools::clean_dll()
devtools::load_all()
# Print debugging (appears in R console)
# In C++: Rcpp::Rcout << "value: " << x << std::endl;Common Pitfalls
| Issue | Solution |
|---|---|
| Segfault | Check array bounds; R uses 1-indexing, C++ uses 0 |
| Wrong results | Ensure integer vs double types match |
| Slow compilation | Split large files; use // [[Rcpp::export]] sparingly |
| Memory leaks | Rcpp handles most memory; avoid raw new/delete |
| RNG issues | Use R::runif(), R::rnorm() for reproducibility with set.seed() |
Learning Resources
- Rcpp for Everyone - Comprehensive guide
- Rcpp Gallery - Code examples
- Advanced R - Rcpp Chapter - Hadley Wickham’s introduction
Reference Package
See BATON for a complete example of:
- Package structure and DESCRIPTION
- Roxygen2 documentation style
- Test organization
- Vignette setup
- GitHub Actions for CI (optional)