Research Project Template
General-purpose data analysis template
Overview
A flexible template for exploratory data analysis, statistical modeling, and one-off research projects.
Repository: github.com/rashidlab/template-research-project
Features
- Pre-configured
{targets}pipeline - Package dependencies tracked via DESCRIPTION
- Quarto for reports
- Consistency validation framework
- Centralized configuration
Quick Start
# Clone
git clone git@github.com:rashidlab/template-research-project.git my-project
# Set up
cd my-project
rm -rf .git && git init
# In R
# Install packages listed in DESCRIPTION
targets::tar_make()Directory Structure
my-project/
├── .gitignore
├── R/ # Analysis functions
│ └── load_lab_config.R # Config loader helper
├── README.md
├── analysis/ # Numbered analysis scripts
├── config/
│ ├── branding.yml # Shared branding config (symlink)
│ ├── config.yml # Project-specific parameters
│ ├── lab.yml # Shared lab config (symlink)
│ └── load_lab_config.R # Shared config loader (symlink)
├── data/ # Symlink to /proj/rashidlab/ (gitignored)
│ ├── external/ # External datasets
│ ├── processed/ # Cleaned data
│ └── raw/ # Original data (never modify)
├── figures/
│ ├── exploratory/ # Working figures (not tracked)
│ └── publication/ # Final figures (tracked)
├── output/ # Analysis outputs (gitignored)
├── python/ # Python modules (if needed)
└── reports/ # Quarto documents
Pipeline Structure
# _targets.R
library(targets)
source("R/globals_loader.R")
cfg <- load_globals()
list(
# Data
tar_target(raw_data, load_data("data/raw/dataset.csv")),
tar_target(clean_data, process_data(raw_data)),
# Analysis
tar_target(model_fit, fit_model(clean_data)),
tar_target(results, extract_results(model_fit)),
# Outputs
tar_target(fig_main, create_figure(results, "figures/main.pdf")),
# Report
tar_quarto(report, path = "reports/analysis.qmd")
)Configuration
Edit config/config.yml:
project:
name: "Project Name"
author: "Your Name"
created: "2025-01-01"
paths:
data_raw: "data/raw"
data_processed: "data/processed"
output: "output"
figures: "figures"
analysis:
seed: 20250101
n_cores: 4
# Add project-specific parameters belowCommon Commands
# Run pipeline
Rscript -e "targets::tar_make()"
# Check what's outdated
Rscript -e "targets::tar_outdated()"
# Visualize dependency graph
Rscript -e "targets::tar_visnetwork()"Data Setup
ImportantData Files Are NOT in the Repository
Data files are stored in /proj/rashidlab/ and must be downloaded or symlinked after cloning.
For Lab Members (Longleaf)
# After cloning, create symlink to lab data
cd my-project
ln -s /proj/rashidlab/my-project/data data
# Or copy if you need a local copy
cp -r /proj/rashidlab/my-project/data .For External Users (After Publication)
Add a scripts/download_data.sh to your repo:
#!/bin/bash
# Download data for reproduction
mkdir -p data/raw
# Example: Download from GEO/Zenodo/figshare
wget -O data/raw/dataset.csv "https://example.com/data.csv"
echo "Data ready. Run 'make all' to reproduce."README Data Section Template
Include this in your project README:
## Data
**Note:** Data files are not included in this repository.
### Lab Members
Data is available at `/proj/rashidlab/<project>/data/`.
Create a symlink: `ln -s /proj/rashidlab/<project>/data data`
### External Reproduction
After cloning, download the data:
```bash
bash scripts/download_data.shData sources: - dataset.csv: GEO GSE12345 - clinical.csv: Zenodo 10.5281/zenodo.12345
### Gitignore for Data
Your `.gitignore` should include:
data/ .csv .rds *.RData results/ _targets/ ```
Best Practices
- Keep functions small - One task per function
- Use the config - Never hardcode values
- Document as you go - Update README and CLAUDE.md
- Commit often - Small, focused commits
- Never commit data - Use symlinks to
/proj/rashidlab/