Coding Standards
Lab conventions for R, Python, and Git
These standards ensure code consistency, reproducibility, and maintainability across lab projects.
General Principles
- Reproducibility first - Anyone should be able to run your code and get the same results
- Document as you go - Comments, README files, and docstrings
- Version control everything - Frequent, meaningful commits
- Never hardcode paths - Use
here::here()in R,pathlibin Python
Quick Reference
| Topic | Standard |
|---|---|
| Variable names | snake_case |
| Function names | snake_case verbs |
| Constants | SCREAMING_SNAKE_CASE |
| Indentation | 2 spaces (R), 4 spaces (Python) |
| Line length | ≤80 characters |
| Commits | type: description format |
Why Base R + data.table?
The lab standardizes on base R and data.table rather than tidyverse for several practical reasons:
- Performance:
data.tableis significantly faster thandplyron the large simulation outputs common in our work (millions of rows across parameter configurations) - Fewer dependencies: Base R +
data.tableavoids the sprawling dependency tree of the tidyverse, which means fewer installation headaches on Longleaf and more stable code over time - HPC compatibility: Base R code runs reliably in non-interactive Slurm batch jobs without surprising namespace conflicts or version mismatches
- Longevity: Base R syntax rarely changes between versions, so code written today will still run years from now without modification
You’re welcome to use tidyverse for quick exploratory work on your own, but all code committed to lab repositories should use base R and data.table.
Sections
- R Style Guide - Base R conventions, function documentation
- Python Style Guide - PEP 8, type hints
- Git Practices - Commits, branches, PRs
- Targets Pipeline - Workflow automation
TipR Package Development
For creating reusable R packages (structure, roxygen2, testing, Rcpp), see the Package Development Guide.
R Example
#' Calculate hazard ratio with confidence interval
#'
#' @param data A data frame with time, status, treatment columns
#' @param alpha Significance level (default 0.05)
#' @return A data.frame with hr, lower, upper columns
calculate_hr <- function(data, alpha = 0.05) {
# Validate inputs
stopifnot(
is.data.frame(data),
all(c("time", "status", "treatment") %in% names(data))
)
# Fit model
fit <- coxph(Surv(time, status) ~ treatment, data = data)
# Extract results
data.frame(
hr = exp(coef(fit)),
lower = exp(confint(fit, level = 1 - alpha)[1]),
upper = exp(confint(fit, level = 1 - alpha)[2])
)
}Git Commit Example
feat: add Cox model with time-varying covariates
- Implement time-varying coefficient estimation
- Add unit tests for edge cases
- Update documentation with examples
Closes #42
Code Review Checklist
Before submitting a PR: