Lab Computing
Setup guides, coding standards, and cluster workflows
This page is your hub for all computing-related resources in the lab. Whether you’re setting up your environment, learning our coding standards, or running jobs on Longleaf, you’ll find links to the relevant guides here.
Getting Started
New to the lab or setting up a new machine? Start here:
| Guide | Description |
|---|---|
| Your First Project | Starting your first research project |
| Tools Setup | Overview of required tools and configuration |
| Longleaf Setup | UNC HPC cluster access and configuration |
| Local Development Setup | R, RStudio, VS Code on your machine |
| Git & GitHub Setup | Version control configuration |
Coding Standards
Our lab follows specific conventions for reproducibility and collaboration:
| Guide | Description |
|---|---|
| Coding Standards Overview | Summary of all coding conventions |
| R Style Guide | Base R + data.table conventions |
| Python Style Guide | Python conventions and formatting |
| Git Practices | Commits, branches, PRs, code review |
| Targets Pipeline Guide | Reproducible workflows with {targets} |
Project Consistency
The consistency framework is foundational to reproducible research in our lab. Set it up before you start writing analysis code. See Claude Code Enforcement for how Claude automates this for you.
Keep your code and documentation in sync. All three lab templates (Research Project, Methods Paper, and Clinical Trial) include a CLAUDE.md that references the consistency framework, so Claude already knows the expected structure when you start a session. However, the templates provide the directory layout but not the actual framework files — after cloning a template, ask Claude to generate them for your project (see below).
| Guide | Description |
|---|---|
| Project Consistency Framework | Overview of the consistency system |
| Centralized Configuration | Single source of truth for values |
| Data Provenance | Tracking data origins and transformations |
| Claude Code Enforcement | Using Claude to generate and validate files |
Using Claude to Generate Consistency Files
After cloning any lab template, ask Claude Code to set up and populate the consistency framework files. Because the template’s CLAUDE.md already describes the framework, Claude knows what to create and where to put it:
> Set up the project consistency framework for my simulation study
Claude will create:
| File | Purpose |
|---|---|
config/globals.yml |
Centralized parameters (n_reps, alpha, seeds) |
config/consistency_registry.yml |
Claims tracking for manuscript statements |
R/globals_loader.R |
Config loading utilities |
scripts/validate_consistency.R |
Validation script |
DEFAULTS.md |
Parameter documentation |
docs/DATA_PROVENANCE.md |
Figure/table source tracing |
Integration with Existing Files
Claude will also update your pipeline files to use the consistency framework:
# _targets.R
library(targets)
source("R/globals_loader.R")
cfg <- load_globals()
list(
tar_target(sim_results,
run_simulation(n_reps = cfg$simulation$n_reps_high))
)# .github/workflows/validate.yml
- name: Check Consistency
run: Rscript scripts/validate_consistency.R# Makefile
submit: validate-consistency
@echo "Ready for submission"
quarto render manuscript/paper.qmdSee Claude Code Enforcement for detailed prompts and workflows.
All lab templates reference the consistency framework in their CLAUDE.md but don’t include the actual files yet. After cloning any template, ask Claude to generate globals.yml, DATA_PROVENANCE.md, and other framework files. The scope will vary by project type:
- Research Project: Simpler config (seed, key analysis parameters)
- Methods Paper: Full framework (simulation parameters, manuscript claims, figure provenance)
- Clinical Trial: Extensive config (trial design parameters, interim analysis rules, operating characteristics)
See Template Status for exactly which files each template includes.
HPC Resources
Start with Longleaf Setup for initial access, OnDemand, and basic configuration. The sections below cover advanced usage for production workflows.
Data Storage
Lab data lives on the shared project space:
- Project directory:
/proj/rashidlab/(available on Longleaf and Sycamore) - Quota: 1 TB shared across the lab — check usage at service.rc.unc.edu
- Private data: Store in
/proj/rashidlab/projects/or/proj/rashidlab/users/(the top-level/proj/rashidlab/directory is publicly readable on the cluster) - Not backed up: Files persist but are not backed up — keep copies of irreplaceable data elsewhere
- Symlinks: Projects should symlink large data into their repo rather than committing it to Git
- Access changes: Email research@unc.edu to add/remove lab members (group:
rc_nur2lab_psx) - Compliance: If storing NIST 800-171 regulated data, see Longleaf Compliance
See Lab Policies for data management and HIPAA requirements, or Longleaf Setup for the full directory reference.
Slurm Job Scheduler
Prerequisites
Before submitting jobs, ensure your log directory exists:
# Create log directory (required by batch scripts below)
mkdir -p logsBasic Commands
# Submit a job
sbatch my_script.sh
# Check queue
squeue -u $USER
# Cancel a job
scancel JOB_ID
# Job information
scontrol show job JOB_ID
sacct -j JOB_ID
# Interactive session (for debugging)
srun --pty -p interact -n 1 --mem=8G -t 60 bashExample Batch Script
#!/bin/bash
#SBATCH --job-name=my_analysis
#SBATCH --partition=general
#SBATCH --time=24:00:00
#SBATCH --mem=16G
#SBATCH --cpus-per-task=4
#SBATCH --output=logs/slurm_%j.out
#SBATCH --error=logs/slurm_%j.err
# Check available R versions: module spider r
module load r/4.4.0
Rscript my_analysis.RR versions on Longleaf change over time. Check available versions with module spider r and use the latest stable release. Update your batch scripts accordingly.
Partitions
| Partition | Time Limit | Use Case |
|---|---|---|
debug |
4 hours | Testing, quick jobs |
general |
7 days | Standard jobs |
bigmem |
7 days | High memory jobs |
gpu |
7 days | GPU computing (PyTorch, deep learning) |
interact |
4 hours | Interactive debugging sessions |
Resource Guidelines
| Job Type | CPUs | Memory | Time |
|---|---|---|---|
| Light analysis | 1-2 | 4-8 GB | 1-4 hours |
| Simulation (single) | 4 | 8-16 GB | 4-24 hours |
| Heavy simulation | 8-16 | 32-64 GB | 24-48 hours |
| Bayesian MCMC | 4-8 | 16-32 GB | 12-48 hours |
| Parallel (targets) | 4/worker | 4 GB/CPU | varies |
For Stan, JAGS, or Nimble models: request one CPU per chain. Four chains at 4 CPUs is typical. Memory needs depend on the model and data size—start with 16 GB and increase if jobs fail with OUT_OF_MEMORY.
Debugging Failed Jobs
When a job fails, check these common causes:
# Check exit code and reason
sacct -j JOB_ID --format=JobID,State,ExitCode,MaxRSS,Elapsed
# Read error log
cat logs/slurm_JOB_ID.err| Symptom | Likely Cause | Fix |
|---|---|---|
OUT_OF_MEMORY |
Insufficient --mem |
Increase memory allocation |
TIMEOUT |
Exceeded --time |
Increase time limit or optimize code |
FAILED exit code 1 |
R script error | Check .err log for R traceback |
MODULE_NOT_FOUND |
Wrong module name | Run module spider r to find correct name |
CANCELLED by user |
Manually cancelled or dependency failed | Check dependent jobs |
# Check actual memory used by a completed job (useful for tuning)
sacct -j JOB_ID --format=JobID,MaxRSS,ReqMem,Elapsed,StateTargets + Slurm Integration
The {targets} package with {crew.cluster} enables distributed pipelines on Longleaf.
Configuration
# In _targets.R
library(crew)
library(crew.cluster)
tar_option_set(
controller = crew_controller_slurm(
name = "longleaf",
workers = 20,
slurm_partition = "general",
slurm_time_minutes = 1440,
slurm_cpus_per_task = 4,
slurm_memory_gigabytes_per_cpu = 4,
slurm_log_output = "logs/slurm_%j.out"
)
)Running Pipeline on Cluster
mkdir -p logs # Ensure log directory exists
# Submit controller job
sbatch run_pipeline.shrun_pipeline.sh:
#!/bin/bash
#SBATCH --job-name=targets_controller
#SBATCH --time=48:00:00
#SBATCH --mem=8G
#SBATCH --cpus-per-task=2
#SBATCH --output=logs/controller_%j.out
module load r/4.4.0
Rscript -e "targets::tar_make()"Monitoring
# Watch queue
watch -n 10 squeue -u $USER
# Check targets progress
Rscript -e "targets::tar_progress()"
# View logs
tail -f logs/slurm_*.outArray Jobs
For running many similar jobs (e.g., simulation replicates):
#!/bin/bash
#SBATCH --job-name=sim_array
#SBATCH --array=1-100%20
#SBATCH --cpus-per-task=1
#SBATCH --mem=4G
#SBATCH --time=2:00:00
#SBATCH --output=logs/sim_%A_%a.out
module load r/4.4.0
Rscript run_simulation.R $SLURM_ARRAY_TASK_IDThe %20 in --array=1-100%20 limits to 20 concurrent tasks. This prevents monopolizing the cluster and is required for large arrays. Use %10–%30 depending on the job size and cluster load.
In R:
# run_simulation.R
args <- commandArgs(trailingOnly = TRUE)
task_id <- as.integer(args[1])
set.seed(task_id)
# ... run simulation with this seedQuick Mode for Testing
Use reduced parameters for local testing before cluster submission:
# Environment variable approach
QUICK_MODE=1 Rscript my_analysis.RWith globals.yml Integration
The recommended approach combines QUICK_MODE with centralized configuration:
# In R script
source("R/globals_loader.R")
cfg <- load_globals()
quick_mode <- Sys.getenv("QUICK_MODE") == "1"
n_reps <- if (quick_mode) cfg$simulation$n_reps_low else cfg$simulation$n_reps_highThis ensures even “quick” runs use values from globals.yml rather than hardcoded numbers, keeping them consistent with the project configuration.
Without globals.yml (Simple Scripts)
# In R script
if (Sys.getenv("QUICK_MODE") == "1") {
n_reps <- 100
} else {
n_reps <- 10000
}Claude Code on Longleaf
Claude Code can help manage HPC workflows directly from the command line.
Setup
# Install on Longleaf
curl -fsSL https://claude.ai/install.sh | sh
claude loginCommon Tasks
> Submit this job to Slurm with 4 CPUs and 16GB RAM
> Check my queue status
> Configure targets to use 20 Slurm workers
> Debug why my job failed
Long Sessions
Use tmux to keep Claude running:
tmux new -s claude
claude
# Ctrl-b, d to detach
# tmux attach -t claude to reattachSee the Claude Code HPC Guide for comprehensive documentation.
Getting Help
- Longleaf issues: research@unc.edu
- Lab computing questions:
#computingTeams channel - Targets/pipeline help: targets documentation
- Claude Code help: Claude Code Guide
- Initial setup: Tools Setup Guide