Coding Standards

Lab conventions for R, Python, and Git

These standards ensure code consistency, reproducibility, and maintainability across lab projects.

General Principles

  1. Reproducibility first - Anyone should be able to run your code and get the same results
  2. Document as you go - Comments, README files, and docstrings
  3. Version control everything - Frequent, meaningful commits
  4. Never hardcode paths - Use here::here() in R, pathlib in Python

Quick Reference

Topic Standard
Variable names snake_case
Function names snake_case verbs
Constants SCREAMING_SNAKE_CASE
Indentation 2 spaces (R), 4 spaces (Python)
Line length ≤80 characters
Commits type: description format

Why Base R + data.table?

The lab standardizes on base R and data.table rather than tidyverse for several practical reasons:

  • Performance: data.table is significantly faster than dplyr on the large simulation outputs common in our work (millions of rows across parameter configurations)
  • Fewer dependencies: Base R + data.table avoids the sprawling dependency tree of the tidyverse, which means fewer installation headaches on Longleaf and more stable code over time
  • HPC compatibility: Base R code runs reliably in non-interactive Slurm batch jobs without surprising namespace conflicts or version mismatches
  • Longevity: Base R syntax rarely changes between versions, so code written today will still run years from now without modification

You’re welcome to use tidyverse for quick exploratory work on your own, but all code committed to lab repositories should use base R and data.table.

Sections

TipR Package Development

For creating reusable R packages (structure, roxygen2, testing, Rcpp), see the Package Development Guide.

R Example

#' Calculate hazard ratio with confidence interval
#'
#' @param data A data frame with time, status, treatment columns
#' @param alpha Significance level (default 0.05)
#' @return A data.frame with hr, lower, upper columns
calculate_hr <- function(data, alpha = 0.05) {
  # Validate inputs
  stopifnot(
    is.data.frame(data),
    all(c("time", "status", "treatment") %in% names(data))
  )

  # Fit model
  fit <- coxph(Surv(time, status) ~ treatment, data = data)

  # Extract results
  data.frame(
    hr = exp(coef(fit)),
    lower = exp(confint(fit, level = 1 - alpha)[1]),
    upper = exp(confint(fit, level = 1 - alpha)[2])
  )
}

Git Commit Example

feat: add Cox model with time-varying covariates

- Implement time-varying coefficient estimation
- Add unit tests for edge cases
- Update documentation with examples

Closes #42

Code Review Checklist

Before submitting a PR: