HPC Usage

Using Claude Code on Longleaf

Modified

February 25, 2026

Intermediate 10 min read Phase 3: Daily Work

Prerequisites

This guide covers using Claude Code on the Longleaf HPC cluster. For general HPC documentation, see Computing Resources.

When to Use HPC vs Local

Use Local When… Use HPC When…
Writing and debugging code Running production simulations
Small test runs Jobs need >16GB RAM
Interactive exploration Long-running pipelines (>1 hour)
Editing manuscripts Parallel execution with many workers
TipDevelopment Workflow

Write and test code locally with reduced parameters, then run production jobs on Longleaf.

Connecting to Longleaf

Basic SSH

ssh <onyen>@longleaf.unc.edu

VS Code Remote-SSH

If you use VS Code: 1. Install Remote-SSH extension 2. Connect to <onyen>@longleaf.unc.edu 3. Open terminal in VS Code 4. Run claude

This gives you a graphical editor with Claude in the integrated terminal.

Starting Claude on Longleaf

From a Login Node

# Navigate to your project
cd /proj/rashidlab/users/$USER/my-project

# Start Claude
claude
WarningLogin Node Limits

Login nodes have memory and CPU limits. For heavy analysis, request an interactive session first.

Interactive Session for Heavy Work

# Request interactive session
srun --pty -p interact -n 1 -t 120 --mem=16G bash

# Then start Claude
claude

HPC-Specific Workflows

Submitting Slurm Jobs

> Submit this R script to Slurm with 4 CPUs and 16GB RAM

> Check the status of my running jobs

> Cancel job 12345678

Claude runs:

sbatch --cpus-per-task=4 --mem=16G script.sh
squeue -u $USER
scancel 12345678

Monitoring Jobs

> Show me the output from my most recent Slurm job

> What's in the error log for job 12345678?

> How much memory did my last job use?

Claude runs:

tail -50 logs/slurm_*.out | tail -50
cat logs/slurm_12345678.err
sacct -j 12345678 --format=JobID,MaxRSS,Elapsed

Debugging Failed Jobs

> My job failed with "out of memory". What should I request?

> The simulation job timed out. Help me estimate how long it needs.

Claude: 1. Checks the error logs 2. Analyzes resource usage 3. Suggests updated Slurm parameters

Targets + Slurm

> Configure this pipeline to use 20 Slurm workers

> The simulation targets keep timing out. Increase the time limit.

> Check which targets are currently running on workers

See Targets Pipeline Guide for detailed configuration.

Submit→Monitor→Fix Loop

The biggest HPC time-saver: let Claude manage the full job lifecycle. Instead of submitting a job, waiting for it to fail, SSHing in to read the logs, fixing the script, and resubmitting — Claude handles the entire cycle.

First Submission (Where Most Failures Happen)

First-time job submissions frequently fail due to environment issues — wrong module, missing package, bad path, insufficient resources. Claude can diagnose and fix these without you reading the logs:

> Submit scripts/run_simulation.sh to Slurm and monitor it

Claude submits the job, waits briefly, then checks status:

> Check if my most recent job is still running or has failed

If the job failed:

> Read the error log for my most recent failed job, diagnose the
> problem, fix it, and resubmit

Claude reads the Slurm error log, identifies the issue, and fixes it:

Common First-Submission Failure What Claude Does
module: command not found Adds source /etc/profile.d/modules.sh to job script
there is no package called 'X' Adds package install or updates .libPaths() in the script
CANCELLED ... due to time limit Increases --time in the Slurm directives
oom-kill (out of memory) Increases --mem based on sacct resource usage from the failed job
cannot open file '/proj/...' Fixes paths, checks symlinks, verifies permissions
No such file or directory Identifies missing input files or wrong working directory

Full Job Lifecycle Example

Here’s what a realistic Claude-managed HPC session looks like:

> Submit the simulation job to Slurm with 8 cores and 32GB RAM

# Claude runs: sbatch --cpus-per-task=8 --mem=32G scripts/run_sim.sh
# Output: Submitted batch job 12345678

> Check on job 12345678

# Claude runs: squeue -j 12345678
# Sees it's still running, or has completed/failed

> It failed. What went wrong?

# Claude runs: cat logs/slurm-12345678.err
# Reads: "Error in library(data.table): package not found"
# Claude identifies the issue: R library path not set for Slurm environment

> Fix it and resubmit

# Claude adds .libPaths() to the R script header
# Resubmits: sbatch --cpus-per-task=8 --mem=32G scripts/run_sim.sh
# Output: Submitted batch job 12345679

Monitoring Long-Running Jobs

For jobs that run for hours, use the check-in pattern:

> What's the status of all my running jobs? Show elapsed time
> and estimated completion.

Claude runs squeue and sacct to show you what’s running and how long it’s been. For targets + crew pipelines:

> Check how many targets have completed in the current pipeline run

Claude runs tar_progress() to show exactly which targets are done, running, or errored — without you needing to read the pipeline log.

Resource Optimization After First Run

After a successful job, ask Claude to optimize for next time:

> Check how much memory and time my last job actually used,
> and update the Slurm script with tighter resource requests

Claude runs:

sacct -j 12345679 --format=JobID,MaxRSS,Elapsed,ReqMem,Timelimit

Then updates the Slurm directives to request only what’s needed — shorter queue wait times for future submissions.

TipCombine with GSD for Multi-Phase HPC Work

For large simulation studies with multiple phases, use GSD workflows to plan each phase, then /gsd:pause-work before submitting cluster jobs. When results are back, /gsd:resume-work picks up the analysis.

Data Management

Checking Storage

> How much space am I using in /proj/rashidlab?

> What's the largest file in this project?

Claude runs:

du -sh /proj/rashidlab/users/$USER
find . -type f -exec du -h {} + | sort -rh | head -10

Data Provenance

> Update DATA_PROVENANCE.md with the new dataset location

Claude documents where data came from and where it’s stored.

Common HPC Tasks

Check Cluster Status

> What partitions are available and how busy are they?

> How many jobs do I have in the queue?

Module Management

> What R version is loaded?

> Load R 4.3.0 and check it's working

Claude runs:

module list
module load r/4.3.0
R --version

Environment Setup

> Set up my R environment to use the lab package library

Claude creates/updates ~/.Rprofile:

.libPaths(c(
  "~/R/library",
  "/proj/rashidlab/R-packages",
  .libPaths()
))

Remote Development Patterns

Pattern 1: Local Edit, Remote Run

  1. Edit code locally with Claude
  2. Push to GitHub
  3. Pull on Longleaf
  4. Run pipeline
# Local
> /commit
> Push to origin

# On Longleaf
> Pull latest changes and run the pipeline

Pattern 2: Full Remote

  1. SSH to Longleaf with tmux
  2. Use Claude for everything
  3. Detach when running long jobs
# Start session
tmux new -s project

# Work with Claude
claude
> Configure and run the simulation pipeline
> (Claude sets up Slurm jobs)

# Detach while jobs run
# Ctrl-b, d

# Check back later
tmux attach -t project
> What's the status of the pipeline?

Pattern 3: VS Code Remote

  1. Connect VS Code to Longleaf
  2. Edit files in VS Code
  3. Use Claude in integrated terminal
  4. Best of both worlds

Long-Running Sessions

Keeping Claude Alive

Use tmux to persist sessions:

# Named session for each project
tmux new -s my-project-claude

Session Recovery

If disconnected:

# List sessions
tmux ls

# Reattach
tmux attach -t my-project-claude

Multiple Projects

# Start sessions for different projects
tmux new -s project-a
# Ctrl-b, d to detach

tmux new -s project-b
# Ctrl-b, d to detach

# Switch between them
tmux attach -t project-a

Performance Tips

Pre-approve Common Commands

Add to .claude/settings.json:

{
  "permissions": {
    "allow": [
      "Bash(sbatch *)",
      "Bash(squeue *)",
      "Bash(scancel *)",
      "Bash(srun *)",
      "Bash(module *)"
    ]
  }
}

Use Specific File References

# Faster (Claude goes directly there)
> Check the error in logs/slurm_12345678.err

# Slower (Claude searches)
> Check the Slurm errors

Batch Operations

# More efficient
> Submit all scripts in jobs/ to Slurm

# Less efficient
> Submit job1.sh
> Submit job2.sh
> Submit job3.sh

Troubleshooting

“Cannot connect to server”

# Check you're on a login node
hostname  # Should be login*.longleaf.unc.edu

# Check network
curl -I https://claude.ai

Session Timeout

> /exit

Then restart Claude. Or use tmux to prevent timeouts.

Slow Response

On shared login nodes, Claude may be slow. Request an interactive session:

srun --pty -p interact -n 1 -t 60 --mem=8G bash
claude

Permission Denied

# Check file permissions
ls -la /proj/rashidlab/users/$USER

# Fix if needed
chmod -R u+rw /proj/rashidlab/users/$USER/my-project

Quick Reference

Task Command
Start tmux session tmux new -s claude
Detach tmux Ctrl-b, d
Reattach tmux tmux attach -t claude
Interactive session srun --pty -p interact -n 1 -t 60 bash
Check queue squeue -u $USER
Submit job sbatch script.sh
Cancel job scancel JOB_ID
Check storage du -sh /proj/rashidlab/users/$USER

Next Steps