HPC Usage
Using Claude Code on Longleaf
Prerequisites
- Installation (Longleaf section)
- Longleaf account (see Computing Resources)
This guide covers using Claude Code on the Longleaf HPC cluster. For general HPC documentation, see Computing Resources.
When to Use HPC vs Local
| Use Local When… | Use HPC When… |
|---|---|
| Writing and debugging code | Running production simulations |
| Small test runs | Jobs need >16GB RAM |
| Interactive exploration | Long-running pipelines (>1 hour) |
| Editing manuscripts | Parallel execution with many workers |
Write and test code locally with reduced parameters, then run production jobs on Longleaf.
Connecting to Longleaf
Basic SSH
ssh <onyen>@longleaf.unc.eduUsing tmux (Recommended)
tmux keeps Claude running even if you disconnect:
# Start new session
tmux new -s claude
# Start Claude
claude
# Detach: Ctrl-b, then d
# Later, reattach
tmux attach -t claudeVS Code Remote-SSH
If you use VS Code: 1. Install Remote-SSH extension 2. Connect to <onyen>@longleaf.unc.edu 3. Open terminal in VS Code 4. Run claude
This gives you a graphical editor with Claude in the integrated terminal.
Starting Claude on Longleaf
From a Login Node
# Navigate to your project
cd /proj/rashidlab/users/$USER/my-project
# Start Claude
claudeLogin nodes have memory and CPU limits. For heavy analysis, request an interactive session first.
Interactive Session for Heavy Work
# Request interactive session
srun --pty -p interact -n 1 -t 120 --mem=16G bash
# Then start Claude
claudeHPC-Specific Workflows
Submitting Slurm Jobs
> Submit this R script to Slurm with 4 CPUs and 16GB RAM
> Check the status of my running jobs
> Cancel job 12345678
Claude runs:
sbatch --cpus-per-task=4 --mem=16G script.sh
squeue -u $USER
scancel 12345678Monitoring Jobs
> Show me the output from my most recent Slurm job
> What's in the error log for job 12345678?
> How much memory did my last job use?
Claude runs:
tail -50 logs/slurm_*.out | tail -50
cat logs/slurm_12345678.err
sacct -j 12345678 --format=JobID,MaxRSS,ElapsedDebugging Failed Jobs
> My job failed with "out of memory". What should I request?
> The simulation job timed out. Help me estimate how long it needs.
Claude: 1. Checks the error logs 2. Analyzes resource usage 3. Suggests updated Slurm parameters
Targets + Slurm
> Configure this pipeline to use 20 Slurm workers
> The simulation targets keep timing out. Increase the time limit.
> Check which targets are currently running on workers
See Targets Pipeline Guide for detailed configuration.
Submit→Monitor→Fix Loop
The biggest HPC time-saver: let Claude manage the full job lifecycle. Instead of submitting a job, waiting for it to fail, SSHing in to read the logs, fixing the script, and resubmitting — Claude handles the entire cycle.
First Submission (Where Most Failures Happen)
First-time job submissions frequently fail due to environment issues — wrong module, missing package, bad path, insufficient resources. Claude can diagnose and fix these without you reading the logs:
> Submit scripts/run_simulation.sh to Slurm and monitor it
Claude submits the job, waits briefly, then checks status:
> Check if my most recent job is still running or has failed
If the job failed:
> Read the error log for my most recent failed job, diagnose the
> problem, fix it, and resubmit
Claude reads the Slurm error log, identifies the issue, and fixes it:
| Common First-Submission Failure | What Claude Does |
|---|---|
module: command not found |
Adds source /etc/profile.d/modules.sh to job script |
there is no package called 'X' |
Adds package install or updates .libPaths() in the script |
CANCELLED ... due to time limit |
Increases --time in the Slurm directives |
oom-kill (out of memory) |
Increases --mem based on sacct resource usage from the failed job |
cannot open file '/proj/...' |
Fixes paths, checks symlinks, verifies permissions |
No such file or directory |
Identifies missing input files or wrong working directory |
Full Job Lifecycle Example
Here’s what a realistic Claude-managed HPC session looks like:
> Submit the simulation job to Slurm with 8 cores and 32GB RAM
# Claude runs: sbatch --cpus-per-task=8 --mem=32G scripts/run_sim.sh
# Output: Submitted batch job 12345678
> Check on job 12345678
# Claude runs: squeue -j 12345678
# Sees it's still running, or has completed/failed
> It failed. What went wrong?
# Claude runs: cat logs/slurm-12345678.err
# Reads: "Error in library(data.table): package not found"
# Claude identifies the issue: R library path not set for Slurm environment
> Fix it and resubmit
# Claude adds .libPaths() to the R script header
# Resubmits: sbatch --cpus-per-task=8 --mem=32G scripts/run_sim.sh
# Output: Submitted batch job 12345679
Monitoring Long-Running Jobs
For jobs that run for hours, use the check-in pattern:
> What's the status of all my running jobs? Show elapsed time
> and estimated completion.
Claude runs squeue and sacct to show you what’s running and how long it’s been. For targets + crew pipelines:
> Check how many targets have completed in the current pipeline run
Claude runs tar_progress() to show exactly which targets are done, running, or errored — without you needing to read the pipeline log.
Resource Optimization After First Run
After a successful job, ask Claude to optimize for next time:
> Check how much memory and time my last job actually used,
> and update the Slurm script with tighter resource requests
Claude runs:
sacct -j 12345679 --format=JobID,MaxRSS,Elapsed,ReqMem,TimelimitThen updates the Slurm directives to request only what’s needed — shorter queue wait times for future submissions.
For large simulation studies with multiple phases, use GSD workflows to plan each phase, then /gsd:pause-work before submitting cluster jobs. When results are back, /gsd:resume-work picks up the analysis.
Data Management
Symlinks to /proj
Projects should use symlinks to lab storage:
> Create a symlink from data/ to /proj/rashidlab/data/my-project
Claude runs:
ln -s /proj/rashidlab/data/my-project dataChecking Storage
> How much space am I using in /proj/rashidlab?
> What's the largest file in this project?
Claude runs:
du -sh /proj/rashidlab/users/$USER
find . -type f -exec du -h {} + | sort -rh | head -10Data Provenance
> Update DATA_PROVENANCE.md with the new dataset location
Claude documents where data came from and where it’s stored.
Common HPC Tasks
Check Cluster Status
> What partitions are available and how busy are they?
> How many jobs do I have in the queue?
Module Management
> What R version is loaded?
> Load R 4.3.0 and check it's working
Claude runs:
module list
module load r/4.3.0
R --versionEnvironment Setup
> Set up my R environment to use the lab package library
Claude creates/updates ~/.Rprofile:
.libPaths(c(
"~/R/library",
"/proj/rashidlab/R-packages",
.libPaths()
))Remote Development Patterns
Pattern 1: Local Edit, Remote Run
- Edit code locally with Claude
- Push to GitHub
- Pull on Longleaf
- Run pipeline
# Local
> /commit
> Push to origin
# On Longleaf
> Pull latest changes and run the pipeline
Pattern 2: Full Remote
- SSH to Longleaf with tmux
- Use Claude for everything
- Detach when running long jobs
# Start session
tmux new -s project
# Work with Claude
claude
> Configure and run the simulation pipeline
> (Claude sets up Slurm jobs)
# Detach while jobs run
# Ctrl-b, d
# Check back later
tmux attach -t project
> What's the status of the pipeline?Pattern 3: VS Code Remote
- Connect VS Code to Longleaf
- Edit files in VS Code
- Use Claude in integrated terminal
- Best of both worlds
Long-Running Sessions
Keeping Claude Alive
Use tmux to persist sessions:
# Named session for each project
tmux new -s my-project-claudeSession Recovery
If disconnected:
# List sessions
tmux ls
# Reattach
tmux attach -t my-project-claudeMultiple Projects
# Start sessions for different projects
tmux new -s project-a
# Ctrl-b, d to detach
tmux new -s project-b
# Ctrl-b, d to detach
# Switch between them
tmux attach -t project-aPerformance Tips
Pre-approve Common Commands
Add to .claude/settings.json:
{
"permissions": {
"allow": [
"Bash(sbatch *)",
"Bash(squeue *)",
"Bash(scancel *)",
"Bash(srun *)",
"Bash(module *)"
]
}
}Use Specific File References
# Faster (Claude goes directly there)
> Check the error in logs/slurm_12345678.err
# Slower (Claude searches)
> Check the Slurm errors
Batch Operations
# More efficient
> Submit all scripts in jobs/ to Slurm
# Less efficient
> Submit job1.sh
> Submit job2.sh
> Submit job3.sh
Troubleshooting
“Cannot connect to server”
# Check you're on a login node
hostname # Should be login*.longleaf.unc.edu
# Check network
curl -I https://claude.aiSession Timeout
> /exit
Then restart Claude. Or use tmux to prevent timeouts.
Slow Response
On shared login nodes, Claude may be slow. Request an interactive session:
srun --pty -p interact -n 1 -t 60 --mem=8G bash
claudePermission Denied
# Check file permissions
ls -la /proj/rashidlab/users/$USER
# Fix if needed
chmod -R u+rw /proj/rashidlab/users/$USER/my-projectQuick Reference
| Task | Command |
|---|---|
| Start tmux session | tmux new -s claude |
| Detach tmux | Ctrl-b, d |
| Reattach tmux | tmux attach -t claude |
| Interactive session | srun --pty -p interact -n 1 -t 60 bash |
| Check queue | squeue -u $USER |
| Submit job | sbatch script.sh |
| Cancel job | scancel JOB_ID |
| Check storage | du -sh /proj/rashidlab/users/$USER |
Next Steps
- Computing Resources — Full HPC documentation
- Targets Pipeline — Slurm integration
- Troubleshooting — Common issues