Lab Computing

Setup guides, coding standards, and cluster workflows

This page is your hub for all computing-related resources in the lab. Whether you’re setting up your environment, learning our coding standards, or running jobs on Longleaf, you’ll find links to the relevant guides here.

Getting Started

New to the lab or setting up a new machine? Start here:

Guide	Description
Your First Project	Starting your first research project
Tools Setup	Overview of required tools and configuration
Longleaf Setup	UNC HPC cluster access and configuration
Local Development Setup	R, RStudio, VS Code on your machine
Git & GitHub Setup	Version control configuration

Coding Standards

Our lab follows specific conventions for reproducibility and collaboration:

Guide	Description
Coding Standards Overview	Summary of all coding conventions
R Style Guide	Base R + data.table conventions
Python Style Guide	Python conventions and formatting
Git Practices	Commits, branches, PRs, code review
Targets Pipeline Guide	Reproducible workflows with {targets}

Project Consistency

Read This Before Starting Any Project

The consistency framework is foundational to reproducible research in our lab. Set it up before you start writing analysis code. See Claude Code Enforcement for how Claude automates this for you.

Keep your code and documentation in sync. All three lab templates (Research Project, Methods Paper, and Clinical Trial) include a CLAUDE.md that references the consistency framework, so Claude already knows the expected structure when you start a session. However, the templates provide the directory layout but not the actual framework files — after cloning a template, ask Claude to generate them for your project (see below).

Guide	Description
Project Consistency Framework	Overview of the consistency system
Centralized Configuration	Single source of truth for values
Data Provenance	Tracking data origins and transformations
Claude Code Enforcement	Using Claude to generate and validate files

Using Claude to Generate Consistency Files

After cloning any lab template, ask Claude Code to set up and populate the consistency framework files. Because the template’s CLAUDE.md already describes the framework, Claude knows what to create and where to put it:

> Set up the project consistency framework for my simulation study

Claude will create:

File	Purpose
`config/globals.yml`	Centralized parameters (n_reps, alpha, seeds)
`config/consistency_registry.yml`	Claims tracking for manuscript statements
`R/globals_loader.R`	Config loading utilities
`scripts/validate_consistency.R`	Validation script
`DEFAULTS.md`	Parameter documentation
`docs/DATA_PROVENANCE.md`	Figure/table source tracing

Integration with Existing Files

Claude will also update your pipeline files to use the consistency framework:

# _targets.R
library(targets)
source("R/globals_loader.R")

cfg <- load_globals()

list(
  tar_target(sim_results,
    run_simulation(n_reps = cfg$simulation$n_reps_high))
)

# .github/workflows/validate.yml
- name: Check Consistency
  run: Rscript scripts/validate_consistency.R

# Makefile
submit: validate-consistency
    @echo "Ready for submission"
    quarto render manuscript/paper.qmd

See Claude Code Enforcement for detailed prompts and workflows.

Template Setup Required

All lab templates reference the consistency framework in their CLAUDE.md but don’t include the actual files yet. After cloning any template, ask Claude to generate globals.yml, DATA_PROVENANCE.md, and other framework files. The scope will vary by project type:

Research Project: Simpler config (seed, key analysis parameters)
Methods Paper: Full framework (simulation parameters, manuscript claims, figure provenance)
Clinical Trial: Extensive config (trial design parameters, interim analysis rules, operating characteristics)

See Template Status for exactly which files each template includes.

HPC Resources

New to Longleaf?

Start with Longleaf Setup for initial access, OnDemand, and basic configuration. The sections below cover advanced usage for production workflows.

Data Storage

Lab data lives on the shared project space:

Project directory: /proj/rashidlab/ (available on Longleaf and Sycamore)
Quota: 1 TB shared across the lab — check usage at service.rc.unc.edu
Private data: Store in /proj/rashidlab/projects/ or /proj/rashidlab/users/ (the top-level /proj/rashidlab/ directory is publicly readable on the cluster)
Not backed up: Files persist but are not backed up — keep copies of irreplaceable data elsewhere
Symlinks: Projects should symlink large data into their repo rather than committing it to Git
Access changes: Email research@unc.edu to add/remove lab members (group: rc_nur2lab_psx)
Compliance: If storing NIST 800-171 regulated data, see Longleaf Compliance

See Lab Policies for data management and HIPAA requirements, or Longleaf Setup for the full directory reference.

Slurm Job Scheduler

Prerequisites

Before submitting jobs, ensure your log directory exists:

# Create log directory (required by batch scripts below)
mkdir -p logs

Basic Commands

# Submit a job
sbatch my_script.sh

# Check queue
squeue -u $USER

# Cancel a job
scancel JOB_ID

# Job information
scontrol show job JOB_ID
sacct -j JOB_ID

# Interactive session (for debugging)
srun --pty -p interact -n 1 --mem=8G -t 60 bash

Example Batch Script

#!/bin/bash
#SBATCH --job-name=my_analysis
#SBATCH --partition=general
#SBATCH --time=24:00:00
#SBATCH --mem=16G
#SBATCH --cpus-per-task=4
#SBATCH --output=logs/slurm_%j.out
#SBATCH --error=logs/slurm_%j.err

# Check available R versions: module spider r
module load r/4.4.0

Rscript my_analysis.R

Note

R versions on Longleaf change over time. Check available versions with module spider r and use the latest stable release. Update your batch scripts accordingly.

Partitions

Partition	Time Limit	Use Case
`debug`	4 hours	Testing, quick jobs
`general`	7 days	Standard jobs
`bigmem`	7 days	High memory jobs
`gpu`	7 days	GPU computing (PyTorch, deep learning)
`interact`	4 hours	Interactive debugging sessions

Resource Guidelines

Job Type	CPUs	Memory	Time
Light analysis	1-2	4-8 GB	1-4 hours
Simulation (single)	4	8-16 GB	4-24 hours
Heavy simulation	8-16	32-64 GB	24-48 hours
Bayesian MCMC	4-8	16-32 GB	12-48 hours
Parallel (targets)	4/worker	4 GB/CPU	varies

Bayesian MCMC Jobs

For Stan, JAGS, or Nimble models: request one CPU per chain. Four chains at 4 CPUs is typical. Memory needs depend on the model and data size—start with 16 GB and increase if jobs fail with OUT_OF_MEMORY.

Debugging Failed Jobs

When a job fails, check these common causes:

# Check exit code and reason
sacct -j JOB_ID --format=JobID,State,ExitCode,MaxRSS,Elapsed

# Read error log
cat logs/slurm_JOB_ID.err

Symptom	Likely Cause	Fix
`OUT_OF_MEMORY`	Insufficient `--mem`	Increase memory allocation
`TIMEOUT`	Exceeded `--time`	Increase time limit or optimize code
`FAILED` exit code 1	R script error	Check `.err` log for R traceback
`MODULE_NOT_FOUND`	Wrong module name	Run `module spider r` to find correct name
`CANCELLED` by user	Manually cancelled or dependency failed	Check dependent jobs

# Check actual memory used by a completed job (useful for tuning)
sacct -j JOB_ID --format=JobID,MaxRSS,ReqMem,Elapsed,State

Targets + Slurm Integration

The {targets} package with {crew.cluster} enables distributed pipelines on Longleaf.

Configuration

# In _targets.R
library(crew)
library(crew.cluster)

tar_option_set(
  controller = crew_controller_slurm(
    name = "longleaf",
    workers = 20,
    slurm_partition = "general",
    slurm_time_minutes = 1440,
    slurm_cpus_per_task = 4,
    slurm_memory_gigabytes_per_cpu = 4,
    slurm_log_output = "logs/slurm_%j.out"
  )
)

Running Pipeline on Cluster

mkdir -p logs  # Ensure log directory exists

# Submit controller job
sbatch run_pipeline.sh

run_pipeline.sh:

#!/bin/bash
#SBATCH --job-name=targets_controller
#SBATCH --time=48:00:00
#SBATCH --mem=8G
#SBATCH --cpus-per-task=2
#SBATCH --output=logs/controller_%j.out

module load r/4.4.0
Rscript -e "targets::tar_make()"

Monitoring

# Watch queue
watch -n 10 squeue -u $USER

# Check targets progress
Rscript -e "targets::tar_progress()"

# View logs
tail -f logs/slurm_*.out

Array Jobs

For running many similar jobs (e.g., simulation replicates):

#!/bin/bash
#SBATCH --job-name=sim_array
#SBATCH --array=1-100%20
#SBATCH --cpus-per-task=1
#SBATCH --mem=4G
#SBATCH --time=2:00:00
#SBATCH --output=logs/sim_%A_%a.out

module load r/4.4.0

Rscript run_simulation.R $SLURM_ARRAY_TASK_ID

Cluster Etiquette: Throttle Array Jobs

The %20 in --array=1-100%20 limits to 20 concurrent tasks. This prevents monopolizing the cluster and is required for large arrays. Use %10–%30 depending on the job size and cluster load.

In R:

# run_simulation.R
args <- commandArgs(trailingOnly = TRUE)
task_id <- as.integer(args[1])

set.seed(task_id)
# ... run simulation with this seed

Shared R Package Library

Using Lab Packages

Add to your ~/.Rprofile on Longleaf:

.libPaths(c(
  "~/R/library",
  "/proj/rashidlab/R-packages",
  .libPaths()
))

Installing to Shared Library

To install a package to the shared lab directory:

Post a message to #computing on Teams with the package name and why it’s needed
Get approval from the PI or a senior lab member
Install to the shared path:

install.packages("mypackage", lib = "/proj/rashidlab/R-packages")

Warning

Do not install to the shared library without checking with the lab first. Conflicting package versions can break other members’ analyses.

Quick Mode for Testing

Use reduced parameters for local testing before cluster submission:

# Environment variable approach
QUICK_MODE=1 Rscript my_analysis.R

With globals.yml Integration

The recommended approach combines QUICK_MODE with centralized configuration:

# In R script
source("R/globals_loader.R")
cfg <- load_globals()

quick_mode <- Sys.getenv("QUICK_MODE") == "1"

n_reps <- if (quick_mode) cfg$simulation$n_reps_low else cfg$simulation$n_reps_high

This ensures even “quick” runs use values from globals.yml rather than hardcoded numbers, keeping them consistent with the project configuration.

Without globals.yml (Simple Scripts)

# In R script
if (Sys.getenv("QUICK_MODE") == "1") {
  n_reps <- 100
} else {
  n_reps <- 10000
}

Claude Code on Longleaf

Claude Code can help manage HPC workflows directly from the command line.

Setup

# Install on Longleaf
curl -fsSL https://claude.ai/install.sh | sh
claude login

Common Tasks

> Submit this job to Slurm with 4 CPUs and 16GB RAM
> Check my queue status
> Configure targets to use 20 Slurm workers
> Debug why my job failed

Long Sessions

Use tmux to keep Claude running:

tmux new -s claude
claude
# Ctrl-b, d to detach
# tmux attach -t claude to reattach

See the Claude Code HPC Guide for comprehensive documentation.

Related Resources

Resource	Description
Claude Code Guide	AI-assisted development and coding
Lab Branding	Presentations, colors, logos, and visual identity
Lab Policies	Communication, data management, publications
Project Templates	Starters for research projects and papers

Getting Help

Longleaf issues: research@unc.edu
Lab computing questions: #computing Teams channel
Targets/pipeline help: targets documentation
Claude Code help: Claude Code Guide
Initial setup: Tools Setup Guide