Lab Policies

Guidelines for research conduct and collaboration

Communication

Teams

Our primary communication tool. Key channels:

Channel Purpose
#general Announcements, lab-wide discussions
#computing Technical questions, Longleaf issues
#papers Paper discussions, writing feedback
#random Non-work chat

Response expectations:

  • Direct messages: Within 24 hours
  • Channel mentions: Within 48 hours
  • Urgent matters: Email or call

Meetings

  • Lab meetings: Every two weeks (Thursdays 9:30am), attendance expected
  • 1:1s with Dr. Rashid: Bi-weekly or as scheduled
  • Working groups: As needed for projects
TipDetailed Meeting Guide

See the Lab Meetings page for comprehensive guidance on agendas, notes, action items, and the Meeting Schedule for the current rotation.

1:1 Meetings with Dr. Rashid

Bi-weekly individual meetings are for mentorship, progress review, and removing blockers.

Before your 1:1:

  • Review your action items from the last meeting
  • Prepare 2-3 bullet points on progress since last time
  • Identify 1-2 specific questions or blockers to discuss
  • Have your current work ready to show (code, figures, writing)

Typical structure (30 min):

  1. Your updates and questions (15 min) - What you’ve done, what’s blocking you
  2. Feedback and discussion (10 min) - Dr. Rashid’s input, suggestions, connections
  3. Action items (5 min) - Clear next steps with owners and deadlines

Tips:

  • Come prepared—don’t wing it
  • Be specific about blockers (“I’m stuck on X” not “things are hard”)
  • Take notes on action items during the meeting
  • It’s okay to say “I don’t know” or “I need help with…”

Code and Data

Version Control

All code must be version controlled with Git:

  • Public repos on GitHub when possible
  • Private repos for unpublished work
  • No code in email attachments

Data Management

ImportantNever Push Data to Git

Data files—especially large ones—should never be committed to repositories. Use the lab project directory (/proj/rashidlab/) for data storage and symlinks for access.

For a step-by-step walkthrough of setting up data for a new project, see Your First Project — Set Up Data Directory.

Storage Principles

Data Type Location Git Status
Raw data /proj/rashidlab/<project>/data/raw/ Never commit
Processed data /proj/rashidlab/<project>/data/processed/ Never commit
Small configs config/ in repo Commit
Results results/ in repo Usually gitignore

Why Not in Repos?

  1. Size limits: GitHub has a 100MB file limit; data often exceeds this
  2. History bloat: Binary files in Git history cause repo bloat
  3. Sensitive data: PHI and identifiable data must never be in public repos
  4. Collaboration: Large files slow down git clone for collaborators

After Publication

When the paper is accepted:

  1. Public data: Upload to appropriate repository (GEO, Zenodo, figshare)
  2. Update README: Add data download instructions with URLs
  3. Include setup script: scripts/download_data.sh to automate retrieval

Example setup script:

#!/bin/bash
# scripts/download_data.sh - Download data after cloning repo

mkdir -p data/raw

# Download from GEO
wget -O data/raw/expression.csv "https://geo.example.com/GSE12345/data.csv"

# Download from Zenodo
wget -O data/raw/clinical.csv "https://zenodo.org/record/12345/files/clinical.csv"

echo "Data download complete. Run 'make all' to reproduce results."

Gitignore Template

Your .gitignore should include:

# Data (never commit)
data/
*.csv
*.tsv
*.xlsx
*.rds
*.RData

# Large outputs
results/
_targets/

# Sensitive
*.pem
*.key
.env

Key Principles

  • Raw data: Never modify, always preserve original
  • Processed data: Document all transformations in code
  • Sensitive data: Follow IRB protocols strictly — store in /proj/rashidlab/projects/ (not the publicly readable top-level directory)
  • Backups: The lab project directory (/proj/rashidlab/) is not backed up — keep copies of irreplaceable data elsewhere
  • Privacy: The top-level /proj/rashidlab/ is publicly readable on the cluster; use projects/ or users/ subdirectories for anything private

Data Privacy & Security

Research data—especially clinical and genomic data—requires careful handling to protect patient privacy and comply with regulations.

Classification Levels

Level Examples Handling
Public Published results, code Can be shared openly
Internal Unpublished analyses Lab members only
Confidential De-identified clinical data IRB-approved users only
Restricted (PHI) Identifiable patient data Strict access controls

Protected Health Information (PHI)

ImportantPHI Requires Special Handling

PHI includes any data that could identify a patient: names, dates, medical record numbers, genomic sequences linked to identifiers, images, etc.

Requirements for PHI:

  1. IRB approval before any access or analysis
  2. HIPAA training completed annually (via UNC)
  3. Secure storage only—never on personal devices or cloud storage
  4. Access logging—document who accessed what and when
  5. De-identification before any sharing or publication

Secure Computing Practices

Do:

  • Use UNC-approved systems (Longleaf) for sensitive data
  • Enable two-factor authentication on all accounts
  • Lock your workstation when stepping away
  • Report any suspected breaches immediately

Never:

  • Store PHI on personal laptops, Dropbox, Google Drive, or GitHub
  • Share credentials or use shared accounts
  • Email datasets containing identifiable information
  • Leave printed materials with PHI unattended

De-identification

Before sharing data (even internally), remove or transform:

  • Direct identifiers (names, MRNs, SSNs)
  • Dates more specific than year
  • Geographic data smaller than state
  • Any unique characteristics
# Example: basic de-identification
deidentify <- function(data) {
  data$patient_id <- seq_len(nrow(data))  # Replace with sequential IDs
  data$mrn <- NULL                         # Remove MRN
  data$dob <- NULL                         # Remove birthdate
  data$age_group <- cut(data$age, breaks = c(0, 40, 60, 80, Inf),
                        labels = c("<40", "40-59", "60-79", "80+"))
  data$age <- NULL                         # Replace exact age with range
  return(data)
}

Incident Response

If you suspect a data breach or security incident:

  1. Stop any ongoing data transfer immediately
  2. Document what happened (time, files involved, who was affected)
  3. Report to Dr. Rashid and UNC ITS Security within 24 hours
  4. Do not attempt to “fix” or hide the issue

Training Requirements

Training Frequency Link
HIPAA Privacy Annual UNC HIPAA Training
CITI Human Subjects Before research CITI Program
Responsible Conduct Once Graduate school requirement
Information Security Annual UNC ITS Training

Reproducibility

Every analysis must be reproducible:

  1. Document package dependencies (DESCRIPTION file or README)
  2. Set seeds for random processes
  3. Document software versions
  4. Include instructions to rerun

Publications

Authorship

Follow ICMJE guidelines:

  1. Substantial contributions to conception/design or data acquisition/analysis
  2. Drafting or critically revising the manuscript
  3. Final approval of the version to be published
  4. Agreement to be accountable for all aspects

Discuss authorship early and revisit as contributions evolve.

Writing Process

  1. Outline: Share structure before writing
  2. Drafts: Use tracked changes or Git branches
  3. Review: At least one lab member reviews before submission
  4. Preprints: Encouraged for most work (arXiv, bioRxiv)

Code Availability

Published papers should include:

  • GitHub repository with analysis code
  • Data availability statement
  • README with reproduction instructions

Professional Development

Conferences

  • Present work at least once per year
  • Submit abstracts early (discuss with Dr. Rashid)
  • Share travel funding opportunities

Training

  • Complete required compliance training
  • Attend relevant seminars and workshops
  • Share useful resources with the lab

Work-Life Balance

  • Core hours: Generally 10am-4pm for meetings
  • Remote work: Flexible, communicate availability
  • Vacation: Take it! Just coordinate coverage
  • Wellness: Your health comes first

Getting Help

  • Technical issues: #computing Teams channel
  • Research questions: 1:1 or lab meeting
  • Administrative: Department staff
  • Personal concerns: Dr. Rashid (confidentially)