HPC Usage

Using Claude Code on Longleaf

Modified

February 25, 2026

Intermediate 10 min read Phase 3: Daily Work

Prerequisites

Installation (Longleaf section)
Longleaf account (see Computing Resources)

This guide covers using Claude Code on the Longleaf HPC cluster. For general HPC documentation, see Computing Resources.

When to Use HPC vs Local

Use Local When…	Use HPC When…
Writing and debugging code	Running production simulations
Small test runs	Jobs need >16GB RAM
Interactive exploration	Long-running pipelines (>1 hour)
Editing manuscripts	Parallel execution with many workers

Development Workflow

Write and test code locally with reduced parameters, then run production jobs on Longleaf.

Connecting to Longleaf

Basic SSH

ssh <onyen>@longleaf.unc.edu

Using tmux (Recommended)

tmux keeps Claude running even if you disconnect:

# Start new session
tmux new -s claude

# Start Claude
claude

# Detach: Ctrl-b, then d

# Later, reattach
tmux attach -t claude

VS Code Remote-SSH

If you use VS Code: 1. Install Remote-SSH extension 2. Connect to <onyen>@longleaf.unc.edu 3. Open terminal in VS Code 4. Run claude

This gives you a graphical editor with Claude in the integrated terminal.

Starting Claude on Longleaf

From a Login Node

# Navigate to your project
cd /proj/rashidlab/users/$USER/my-project

# Start Claude
claude

Interactive Session for Heavy Work

# Request interactive session
srun --pty -p interact -n 1 -t 120 --mem=16G bash

# Then start Claude
claude

HPC-Specific Workflows

Submitting Slurm Jobs

> Submit this R script to Slurm with 4 CPUs and 16GB RAM

> Check the status of my running jobs

> Cancel job 12345678

Claude runs:

sbatch --cpus-per-task=4 --mem=16G script.sh
squeue -u $USER
scancel 12345678

Monitoring Jobs

> Show me the output from my most recent Slurm job

> What's in the error log for job 12345678?

> How much memory did my last job use?

Claude runs:

tail -50 logs/slurm_*.out | tail -50
cat logs/slurm_12345678.err
sacct -j 12345678 --format=JobID,MaxRSS,Elapsed

Debugging Failed Jobs

> My job failed with "out of memory". What should I request?

> The simulation job timed out. Help me estimate how long it needs.

Claude: 1. Checks the error logs 2. Analyzes resource usage 3. Suggests updated Slurm parameters

Targets + Slurm

> Configure this pipeline to use 20 Slurm workers

> The simulation targets keep timing out. Increase the time limit.

> Check which targets are currently running on workers

See Targets Pipeline Guide for detailed configuration.

Submit→Monitor→Fix Loop

The biggest HPC time-saver: let Claude manage the full job lifecycle. Instead of submitting a job, waiting for it to fail, SSHing in to read the logs, fixing the script, and resubmitting — Claude handles the entire cycle.

First Submission (Where Most Failures Happen)

First-time job submissions frequently fail due to environment issues — wrong module, missing package, bad path, insufficient resources. Claude can diagnose and fix these without you reading the logs:

> Submit scripts/run_simulation.sh to Slurm and monitor it

Claude submits the job, waits briefly, then checks status:

> Check if my most recent job is still running or has failed

If the job failed:

> Read the error log for my most recent failed job, diagnose the
> problem, fix it, and resubmit

Claude reads the Slurm error log, identifies the issue, and fixes it:

Common First-Submission Failure	What Claude Does
`module: command not found`	Adds `source /etc/profile.d/modules.sh` to job script
`there is no package called 'X'`	Adds package install or updates `.libPaths()` in the script
`CANCELLED ... due to time limit`	Increases `--time` in the Slurm directives
`oom-kill` (out of memory)	Increases `--mem` based on `sacct` resource usage from the failed job
`cannot open file '/proj/...'`	Fixes paths, checks symlinks, verifies permissions
`No such file or directory`	Identifies missing input files or wrong working directory

Full Job Lifecycle Example

Here’s what a realistic Claude-managed HPC session looks like:

> Submit the simulation job to Slurm with 8 cores and 32GB RAM

# Claude runs: sbatch --cpus-per-task=8 --mem=32G scripts/run_sim.sh
# Output: Submitted batch job 12345678

> Check on job 12345678

# Claude runs: squeue -j 12345678
# Sees it's still running, or has completed/failed

> It failed. What went wrong?

# Claude runs: cat logs/slurm-12345678.err
# Reads: "Error in library(data.table): package not found"
# Claude identifies the issue: R library path not set for Slurm environment

> Fix it and resubmit

# Claude adds .libPaths() to the R script header
# Resubmits: sbatch --cpus-per-task=8 --mem=32G scripts/run_sim.sh
# Output: Submitted batch job 12345679

Monitoring Long-Running Jobs

For jobs that run for hours, use the check-in pattern:

> What's the status of all my running jobs? Show elapsed time
> and estimated completion.

Claude runs squeue and sacct to show you what’s running and how long it’s been. For targets + crew pipelines:

> Check how many targets have completed in the current pipeline run

Claude runs tar_progress() to show exactly which targets are done, running, or errored — without you needing to read the pipeline log.

Resource Optimization After First Run

After a successful job, ask Claude to optimize for next time:

> Check how much memory and time my last job actually used,
> and update the Slurm script with tighter resource requests

Claude runs:

sacct -j 12345679 --format=JobID,MaxRSS,Elapsed,ReqMem,Timelimit

Then updates the Slurm directives to request only what’s needed — shorter queue wait times for future submissions.

Combine with GSD for Multi-Phase HPC Work

For large simulation studies with multiple phases, use GSD workflows to plan each phase, then /gsd:pause-work before submitting cluster jobs. When results are back, /gsd:resume-work picks up the analysis.

Data Management

Symlinks to /proj

Projects should use symlinks to lab storage:

> Create a symlink from data/ to /proj/rashidlab/data/my-project

Claude runs:

ln -s /proj/rashidlab/data/my-project data

Checking Storage

> How much space am I using in /proj/rashidlab?

> What's the largest file in this project?

Claude runs:

du -sh /proj/rashidlab/users/$USER
find . -type f -exec du -h {} + | sort -rh | head -10

Data Provenance

> Update DATA_PROVENANCE.md with the new dataset location

Claude documents where data came from and where it’s stored.

Common HPC Tasks

Check Cluster Status

> What partitions are available and how busy are they?

> How many jobs do I have in the queue?

Module Management

> What R version is loaded?

> Load R 4.3.0 and check it's working

Claude runs:

module list
module load r/4.3.0
R --version

Environment Setup

> Set up my R environment to use the lab package library

Claude creates/updates ~/.Rprofile:

.libPaths(c(
  "~/R/library",
  "/proj/rashidlab/R-packages",
  .libPaths()
))

Remote Development Patterns

Pattern 1: Local Edit, Remote Run

Edit code locally with Claude
Push to GitHub
Pull on Longleaf
Run pipeline

# Local
> /commit
> Push to origin

# On Longleaf
> Pull latest changes and run the pipeline

Pattern 2: Full Remote

SSH to Longleaf with tmux
Use Claude for everything
Detach when running long jobs

# Start session
tmux new -s project

# Work with Claude
claude
> Configure and run the simulation pipeline
> (Claude sets up Slurm jobs)

# Detach while jobs run
# Ctrl-b, d

# Check back later
tmux attach -t project
> What's the status of the pipeline?

Pattern 3: VS Code Remote

Connect VS Code to Longleaf
Edit files in VS Code
Use Claude in integrated terminal
Best of both worlds

Long-Running Sessions

Keeping Claude Alive

Use tmux to persist sessions:

# Named session for each project
tmux new -s my-project-claude

Session Recovery

If disconnected:

# List sessions
tmux ls

# Reattach
tmux attach -t my-project-claude

Multiple Projects

# Start sessions for different projects
tmux new -s project-a
# Ctrl-b, d to detach

tmux new -s project-b
# Ctrl-b, d to detach

# Switch between them
tmux attach -t project-a

Performance Tips

Pre-approve Common Commands

Add to .claude/settings.json:

{
  "permissions": {
    "allow": [
      "Bash(sbatch *)",
      "Bash(squeue *)",
      "Bash(scancel *)",
      "Bash(srun *)",
      "Bash(module *)"
    ]
  }
}

Use Specific File References

# Faster (Claude goes directly there)
> Check the error in logs/slurm_12345678.err

# Slower (Claude searches)
> Check the Slurm errors

Batch Operations

# More efficient
> Submit all scripts in jobs/ to Slurm

# Less efficient
> Submit job1.sh
> Submit job2.sh
> Submit job3.sh

Troubleshooting

“Cannot connect to server”

# Check you're on a login node
hostname  # Should be login*.longleaf.unc.edu

# Check network
curl -I https://claude.ai

Session Timeout

> /exit

Then restart Claude. Or use tmux to prevent timeouts.

Slow Response

On shared login nodes, Claude may be slow. Request an interactive session:

srun --pty -p interact -n 1 -t 60 --mem=8G bash
claude

Permission Denied

# Check file permissions
ls -la /proj/rashidlab/users/$USER

# Fix if needed
chmod -R u+rw /proj/rashidlab/users/$USER/my-project

Quick Reference

Task	Command
Start tmux session	`tmux new -s claude`
Detach tmux	`Ctrl-b, d`
Reattach tmux	`tmux attach -t claude`
Interactive session	`srun --pty -p interact -n 1 -t 60 bash`
Check queue	`squeue -u $USER`
Submit job	`sbatch script.sh`
Cancel job	`scancel JOB_ID`
Check storage	`du -sh /proj/rashidlab/users/$USER`

Next Steps

Computing Resources — Full HPC documentation
Targets Pipeline — Slurm integration
Troubleshooting — Common issues