Methods Paper Template

Methodology development with simulation studies

Overview

Template for methodology papers featuring simulation studies, designed for journal submission with full reproducibility.

Repository: github.com/rashidlab/template-methods-paper

Features

Dynamic branching over simulation scenarios
Multi-fidelity simulation support
Slurm/Longleaf integration
LaTeX/Quarto manuscript
Reviewer response tracking
Consistency validation framework

Quick Start

# Clone
git clone git@github.com:rashidlab/template-methods-paper.git my-paper

# Set up
cd my-paper
rm -rf .git && git init

# In R
# Install packages listed in DESCRIPTION

# Quick test
QUICK_MODE=1 targets::tar_make()

Directory Structure

my-paper/
├── .gitignore
├── R/  # Analysis/method functions
│   └── load_lab_config.R  # Config loader helper
├── README.md
├── _targets.R  # Pipeline with dynamic branching
├── code/  # Standalone scripts
├── config/
│   ├── branding.yml  # Shared branding config (symlink)
│   ├── lab.yml  # Shared lab config (symlink)
│   ├── load_lab_config.R  # Shared config loader (symlink)
│   └── settings.yml  # Simulation & figure parameters
├── docs/  # Project documentation
├── paper/
│   ├── figures/  # Publication figures
│   └── sections/  # Manuscript sections
├── reviews/  # Reviewer responses by round
│   └── round-1/
└── simulations/
    ├── R/  # Simulation-specific R code
    ├── config/
    │   └── scenarios.yml  # Simulation scenario definitions
    ├── results/  # Raw simulation results (gitignored)
    └── scripts/  # Slurm job scripts

Simulation Pipeline

# _targets.R
library(targets)
library(tarchetypes)

# Source all functions
tar_source("R/")
tar_source("simulations/R/")

tar_option_set(
  packages = c("data.table", "yaml"),
  seed = 2024,
  error = "continue"
)

# For Longleaf cluster, uncomment:
# library(crew)
# library(crew.cluster)
# tar_option_set(
#   controller = crew_controller_slurm(
#     name = "slurm",
#     workers = 20,
#     slurm_partition = "general",
#     slurm_time_minutes = 60,
#     slurm_cpus_per_task = 4,
#     slurm_memory_gigabytes_per_cpu = 4,
#     slurm_log_output = "logs/slurm_%j.out"
#   )
# )

list(
  # Configuration
  tar_target(config, yaml::read_yaml("config/settings.yml")),
  tar_target(scenarios,
    yaml::read_yaml("simulations/config/scenarios.yml")$scenarios
  ),

  # Dynamic branching over scenarios
  tar_target(
    sim_results,
    run_simulation(
      scenario_id = scenarios$id,
      params = scenarios,
      n_reps = config$simulation$n_reps,
      seed = config$seed
    ),
    pattern = map(scenarios),
    deployment = "worker"
  ),

  # Analysis
  tar_target(combined_results, compute_summary_statistics(sim_results)),
  tar_target(results_table, create_results_table(combined_results),
    format = "file"),

  # Figures
  tar_target(fig_main, create_main_figure(combined_results, config$figures),
    format = "file"),
  tar_target(fig_supplementary,
    create_supplementary_figures(combined_results, config$figures),
    format = "file"),

  # Manuscript
  tar_quarto(paper, path = "paper/main.qmd",
    extra_files = c("paper/references.bib", "paper/sections/"))
)

Scenario Configuration

Define scenarios in YAML (simulations/config/scenarios.yml):

scenarios:
  - id: scen_1
    method: proposed
    n: 100
    effect_size: 0.5
    description: "Small sample, medium effect"
  - id: scen_2
    method: proposed
    n: 500
    effect_size: 0.5
    description: "Large sample, medium effect"
  - id: scen_3
    method: competitor
    n: 100
    effect_size: 0.5
    description: "Competitor comparison"

Running on Longleaf

# Submit controller job
sbatch scripts/run_pipeline.sh

# Monitor
watch -n 10 squeue -u $USER

# Check progress
Rscript -e "targets::tar_progress()"

Manuscript Workflow

The manuscript lives in paper/ and is rendered as a terminal node in the {targets} pipeline via tar_quarto(), ensuring figures and tables always reflect computed results.

# Full pipeline (simulations + manuscript)
Rscript -e "targets::tar_make()"

# Check what's outdated
Rscript -e "targets::tar_outdated()"

# Visualize dependency graph
Rscript -e "targets::tar_visnetwork()"

Reviewer Response

After revision requests, organize materials in reviews/:

reviews/
├── round-1/
│   ├── response_letter.qmd     # Point-by-point response
│   ├── diff.pdf                # Track changes
│   └── revision_notes.md       # Internal notes
└── round-2/                    # Add as needed

Common Commands

Rscript -e "targets::tar_make()"       # Run full pipeline
Rscript -e "targets::tar_outdated()"   # Check what needs to run
Rscript -e "targets::tar_visnetwork()" # Visualize dependencies

README Structure for Paper Repos

A good paper repository README helps reproducibility:

# Paper Title

Code and data for: "Full Paper Title" by Author et al. (Year).

## Abstract

Brief summary of the paper (2-3 sentences).

## Repository Structure

```
├── paper/               # Manuscript (Quarto/LaTeX)
├── R/                   # Analysis functions
├── simulations/         # Scenarios, code, results
├── code/                # Standalone scripts
├── config/settings.yml  # Parameters
├── reviews/             # Reviewer responses
├── docs/                # Documentation
├── _targets.R           # Pipeline definition
└── .gitignore
```

## Requirements

- R >= 4.3.0
- Quarto >= 1.4
- Required packages: targets, data.table, survival, ...

### Installing Dependencies

```r
# Install required packages
install.packages(c("targets", "data.table", "survival", ...))

# Install lab packages from GitHub
devtools::install_github("naimurashid/BATON")
```

## Reproducing Results

### Quick Test (5-10 minutes)

```bash
# Run with reduced replications
QUICK_MODE=1 make all
```

### Full Reproduction

```bash
# Full pipeline (may take hours on cluster)
make all

# Or step by step:
make calibrate          # Run calibrations
make figures            # Generate figures
make pdf                # Render manuscript
```

### On Longleaf/HPC

```bash
# Submit to cluster
sbatch scripts/run_pipeline.sh

# Monitor progress
watch squeue -u $USER
```

## Key Results

| Figure/Table | Description | Generating Target |
|--------------|-------------|-------------------|
| Figure 1 | Main results | `fig1_main` |
| Figure 2 | Comparison | `fig2_comparison` |
| Table 1 | Summary | `table1_summary` |

## Data

**Note:** Data files are not included in this repository due to size constraints.

### Setup (Required Before Running)

```bash
# Option 1: Lab members with Longleaf access
ln -s /proj/rashidlab/<project>/data data

# Option 2: External users - download from public repositories
bash scripts/download_data.sh
```

### Data Sources

| File | Description | Source |
|------|-------------|--------|
| `data/raw/dataset.csv` | Main analysis dataset | [GEO GSE12345](https://ncbi.nlm.nih.gov/geo/) |
| `data/raw/clinical.csv` | Clinical covariates | [Zenodo](https://zenodo.org/record/12345) |
| Simulated data | Generated by `_targets.R` | Run pipeline |

### Download Script

The `scripts/download_data.sh` script automates data retrieval:

```bash
#!/bin/bash
mkdir -p data/raw
wget -O data/raw/dataset.csv "https://..."
wget -O data/raw/clinical.csv "https://..."
echo "Data ready. Run 'make all' to reproduce results."
```

## Citation

```bibtex
@article{author2024title,
  title={Paper Title},
  author={Author, First and Author, Second},
  journal={Journal Name},
  year={2024}
}
```

## Contact

- First Author (email@unc.edu)
- [Lab Website](https://rashidlab.org)