Coding Standards

Lab conventions for R, Python, and Git

These standards ensure code consistency, reproducibility, and maintainability across lab projects.

General Principles

Reproducibility first - Anyone should be able to run your code and get the same results
Document as you go - Comments, README files, and docstrings
Version control everything - Frequent, meaningful commits
Never hardcode paths - Use here::here() in R, pathlib in Python

Quick Reference

Topic	Standard
Variable names	`snake_case`
Function names	`snake_case` verbs
Constants	`SCREAMING_SNAKE_CASE`
Indentation	2 spaces (R), 4 spaces (Python)
Line length	≤80 characters
Commits	`type: description` format

Why Base R + data.table?

The lab standardizes on base R and data.table rather than tidyverse for several practical reasons:

Performance: data.table is significantly faster than dplyr on the large simulation outputs common in our work (millions of rows across parameter configurations)
Fewer dependencies: Base R + data.table avoids the sprawling dependency tree of the tidyverse, which means fewer installation headaches on Longleaf and more stable code over time
HPC compatibility: Base R code runs reliably in non-interactive Slurm batch jobs without surprising namespace conflicts or version mismatches
Longevity: Base R syntax rarely changes between versions, so code written today will still run years from now without modification

You’re welcome to use tidyverse for quick exploratory work on your own, but all code committed to lab repositories should use base R and data.table.

Sections

R Style Guide - Base R conventions, function documentation
Python Style Guide - PEP 8, type hints
Git Practices - Commits, branches, PRs
Targets Pipeline - Workflow automation

R Package Development

For creating reusable R packages (structure, roxygen2, testing, Rcpp), see the Package Development Guide.

R Example

#' Calculate hazard ratio with confidence interval
#'
#' @param data A data frame with time, status, treatment columns
#' @param alpha Significance level (default 0.05)
#' @return A data.frame with hr, lower, upper columns
calculate_hr <- function(data, alpha = 0.05) {
  # Validate inputs
  stopifnot(
    is.data.frame(data),
    all(c("time", "status", "treatment") %in% names(data))
  )

  # Fit model
  fit <- coxph(Surv(time, status) ~ treatment, data = data)

  # Extract results
  data.frame(
    hr = exp(coef(fit)),
    lower = exp(confint(fit, level = 1 - alpha)[1]),
    upper = exp(confint(fit, level = 1 - alpha)[2])
  )
}

Git Commit Example

feat: add Cox model with time-varying covariates

- Implement time-varying coefficient estimation
- Add unit tests for edge cases
- Update documentation with examples

Closes #42

Code Review Checklist

Before submitting a PR: