Your First Project
Getting started with a project template
This guide walks through setting up your first project using lab templates.
What is targets (and Why We Use It)?
Most R users are used to running scripts in sequence: 01_clean.R, 02_analyze.R, 03_plot.R. This works until your project grows—then you forget which scripts to re-run when you change something, you accidentally use stale intermediate results, and reproducing the full analysis becomes error-prone.
targets solves this. It’s a pipeline tool that:
- Tracks dependencies automatically — it knows that your plot depends on your model, which depends on your cleaned data
- Only re-runs what changed — if you edit your plotting code, it won’t re-run the data cleaning or model fitting
- Documents your workflow — the pipeline definition (
_targets.R) serves as a readable map of your entire analysis
Think of it like a smart “Run All” button that skips work it doesn’t need to redo.
# A simple pipeline: _targets.R
list(
tar_target(raw_data, read.csv("data/raw/study.csv")),
tar_target(clean_data, clean_dataset(raw_data)),
tar_target(model_fit, fit_model(clean_data)),
tar_target(summary_table, summarize_results(model_fit))
)Each tar_target() defines one step. The first argument is the name, the second is the R expression to run. targets figures out the order from the dependency chain (summary_table needs model_fit, which needs clean_data, etc.).
You’ll learn more as you use it. For now, just know that every lab project uses targets as the backbone of its analysis. See the Targets Pipeline Guide for the full reference.
Choose a Template
We have three main templates:
| Template | Use Case |
|---|---|
| Research Project | General analysis projects |
| Methods Paper | Methodology papers with simulations |
| R Package Development | R packages accompanying a methods paper |
Clone the Template
# Clone research project template
git clone git@github.com:rashidlab/template-research-project.git my-first-project
# Navigate into project
cd my-first-project
# Remove template git history
rm -rf .git
# Initialize fresh git repo
git init
git add .
git commit -m "Initial commit from template"Set Up Data Directory
Data files are never committed to Git. Use the lab project directory and symlinks.
# Create your project's data folder on Longleaf
# Use /proj/rashidlab/projects/ for private project data
# (the top-level /proj/rashidlab/ is publicly readable on the cluster)
mkdir -p /proj/rashidlab/projects/my-first-project/data/raw
mkdir -p /proj/rashidlab/projects/my-first-project/data/processed
# Create a symlink from your repo to the data
ln -s /proj/rashidlab/projects/my-first-project/data data
# Your .gitignore already excludes data/Now place any data files in /proj/rashidlab/projects/my-first-project/data/raw/ and they’ll be accessible via data/raw/ in your code.
The lab shared space (/proj/rashidlab/) has a 1 TB quota shared across all members and is not backed up. Check usage at service.rc.unc.edu. See Longleaf Setup for the full directory reference.
Set Up Dependencies
# Open project in RStudio
# Then restore packages:
# Install required packages listed in DESCRIPTION or READMEConfigure the Project
Edit config/config.yml:
project:
name: "My First Project"
author: "Your Name"
seed: 2024
analysis:
alpha: 0.05Run the Pipeline
# Load targets
library(targets)
# Visualize the pipeline
tar_visnetwork()
# Run the pipeline
tar_make()
# Check results
tar_read(results_summary)Make Your First Change
- Add a new analysis step in
_targets.R:
tar_target(
my_analysis,
run_my_analysis(clean_data)
)- Create the function in
R/analysis.R:
run_my_analysis <- function(data) {
# Your analysis code (using base R - see R Style Guide)
aggregate(value ~ group, data = data, FUN = mean)
}We use base R and data.table instead of tidyverse. For large datasets, use:
library(data.table)
dt <- as.data.table(data)
dt[, .(mean_value = mean(value)), by = group]See R Style Guide for details.
- Run and verify:
tar_make()
tar_read(my_analysis)Commit Your Changes
# Check status
git status
# Stage changes
git add R/analysis.R _targets.R
# Commit with descriptive message
git commit -m "feat: add group-wise mean analysis"
# Push to GitHub (after creating remote repo)
git push -u origin mainProject Structure
After setup, your project looks like:
my-first-project/
├── _targets.R # Pipeline definition
├── R/ # Your functions
│ └── analysis.R
├── config/
│ └── config.yml # Configuration
├── data -> /proj/rashidlab/my-first-project/data # Symlink (gitignored)
├── results/ # Outputs (gitignored)
├── figures/ # Plots
├── scripts/
│ └── download_data.sh # For external users
├── DESCRIPTION # Package dependencies
├── .gitignore # Excludes data/, results/, _targets/
└── README.md # Project documentation
Best Practices
Each function should do one thing. If a function is longer than ~50 lines, consider splitting it.
Make small, frequent commits with descriptive messages. It’s easier to track changes and revert if needed.
Add comments explaining why, not what. Update README when you add features.
Using Claude Code on Your First Project
Claude Code can help you get started and learn the codebase:
# Start Claude in your project
cd ~/rashid-lab-setup/my-first-project
claudeTry these prompts:
> What is the structure of this project?
> Explain how the targets pipeline works
> Help me add a new analysis function
> What do I need to do to run the pipeline?
Claude understands lab conventions and will guide you through using base R, data.table, and targets.
See Claude Code First Session for a detailed walkthrough.
Getting Help
- Pipeline issues: Check
targets::tar_meta()for errors - Package issues: Reinstall packages from DESCRIPTION
- Git issues: Ask in
#computingTeams channel - Claude Code: Claude Code Guide
Next: Coding Standards →