Lab Policies

Guidelines for research conduct and collaboration

Communication

Teams

Our primary communication tool. Key channels:

Channel	Purpose
`#general`	Announcements, lab-wide discussions
`#computing`	Technical questions, Longleaf issues
`#papers`	Paper discussions, writing feedback
`#random`	Non-work chat

Response expectations:

Direct messages: Within 24 hours
Channel mentions: Within 48 hours
Urgent matters: Email or call

Meetings

Lab meetings: Every two weeks (Thursdays 9:30am), attendance expected
1:1s with Dr. Rashid: Bi-weekly or as scheduled
Working groups: As needed for projects

Detailed Meeting Guide

See the Lab Meetings page for comprehensive guidance on agendas, notes, action items, and the Meeting Schedule for the current rotation.

1:1 Meetings with Dr. Rashid

Bi-weekly individual meetings are for mentorship, progress review, and removing blockers.

Before your 1:1:

Review your action items from the last meeting
Prepare 2-3 bullet points on progress since last time
Identify 1-2 specific questions or blockers to discuss
Have your current work ready to show (code, figures, writing)

Typical structure (30 min):

Your updates and questions (15 min) - What you’ve done, what’s blocking you
Feedback and discussion (10 min) - Dr. Rashid’s input, suggestions, connections
Action items (5 min) - Clear next steps with owners and deadlines

Tips:

Come prepared—don’t wing it
Be specific about blockers (“I’m stuck on X” not “things are hard”)
Take notes on action items during the meeting
It’s okay to say “I don’t know” or “I need help with…”

Code and Data

Version Control

All code must be version controlled with Git:

Public repos on GitHub when possible
Private repos for unpublished work
No code in email attachments

Data Management

Never Push Data to Git

Data files—especially large ones—should never be committed to repositories. Use the lab project directory (/proj/rashidlab/) for data storage and symlinks for access.

For a step-by-step walkthrough of setting up data for a new project, see Your First Project — Set Up Data Directory.

Storage Principles

Data Type	Location	Git Status
Raw data	`/proj/rashidlab/<project>/data/raw/`	Never commit
Processed data	`/proj/rashidlab/<project>/data/processed/`	Never commit
Small configs	`config/` in repo	Commit
Results	`results/` in repo	Usually gitignore

Why Not in Repos?

Size limits: GitHub has a 100MB file limit; data often exceeds this
History bloat: Binary files in Git history cause repo bloat
Sensitive data: PHI and identifiable data must never be in public repos
Collaboration: Large files slow down git clone for collaborators

Recommended Workflow

# 1. Store data in lab project directory
cp dataset.csv /proj/rashidlab/my-project/data/raw/

# 2. Create symlink from your repo (optional, for convenience)
cd my-project
ln -s /proj/rashidlab/my-project/data data

# 3. Ensure data/ is in .gitignore
echo "data/" >> .gitignore

After Publication

When the paper is accepted:

Public data: Upload to appropriate repository (GEO, Zenodo, figshare)
Update README: Add data download instructions with URLs
Include setup script: scripts/download_data.sh to automate retrieval

Example setup script:

#!/bin/bash
# scripts/download_data.sh - Download data after cloning repo

mkdir -p data/raw

# Download from GEO
wget -O data/raw/expression.csv "https://geo.example.com/GSE12345/data.csv"

# Download from Zenodo
wget -O data/raw/clinical.csv "https://zenodo.org/record/12345/files/clinical.csv"

echo "Data download complete. Run 'make all' to reproduce results."

Gitignore Template

Your .gitignore should include:

# Data (never commit)
data/
*.csv
*.tsv
*.xlsx
*.rds
*.RData

# Large outputs
results/
_targets/

# Sensitive
*.pem
*.key
.env

Key Principles

Raw data: Never modify, always preserve original
Processed data: Document all transformations in code
Sensitive data: Follow IRB protocols strictly — store in /proj/rashidlab/projects/ (not the publicly readable top-level directory)
Backups: The lab project directory (/proj/rashidlab/) is not backed up — keep copies of irreplaceable data elsewhere
Privacy: The top-level /proj/rashidlab/ is publicly readable on the cluster; use projects/ or users/ subdirectories for anything private

Data Privacy & Security

Research data—especially clinical and genomic data—requires careful handling to protect patient privacy and comply with regulations.

Classification Levels

Level	Examples	Handling
Public	Published results, code	Can be shared openly
Internal	Unpublished analyses	Lab members only
Confidential	De-identified clinical data	IRB-approved users only
Restricted (PHI)	Identifiable patient data	Strict access controls

Protected Health Information (PHI)

PHI Requires Special Handling

PHI includes any data that could identify a patient: names, dates, medical record numbers, genomic sequences linked to identifiers, images, etc.

Requirements for PHI:

IRB approval before any access or analysis
HIPAA training completed annually (via UNC)
Secure storage only—never on personal devices or cloud storage
Access logging—document who accessed what and when
De-identification before any sharing or publication

Secure Computing Practices

Do:

Use UNC-approved systems (Longleaf) for sensitive data
Enable two-factor authentication on all accounts
Lock your workstation when stepping away
Report any suspected breaches immediately

Never:

Store PHI on personal laptops, Dropbox, Google Drive, or GitHub
Share credentials or use shared accounts
Email datasets containing identifiable information
Leave printed materials with PHI unattended

De-identification

Before sharing data (even internally), remove or transform:

Direct identifiers (names, MRNs, SSNs)
Dates more specific than year
Geographic data smaller than state
Any unique characteristics

# Example: basic de-identification
deidentify <- function(data) {
  data$patient_id <- seq_len(nrow(data))  # Replace with sequential IDs
  data$mrn <- NULL                         # Remove MRN
  data$dob <- NULL                         # Remove birthdate
  data$age_group <- cut(data$age, breaks = c(0, 40, 60, 80, Inf),
                        labels = c("<40", "40-59", "60-79", "80+"))
  data$age <- NULL                         # Replace exact age with range
  return(data)
}

Incident Response

If you suspect a data breach or security incident:

Stop any ongoing data transfer immediately
Document what happened (time, files involved, who was affected)
Report to Dr. Rashid and UNC ITS Security within 24 hours
Do not attempt to “fix” or hide the issue

Training Requirements

Training	Frequency	Link
HIPAA Privacy	Annual	UNC HIPAA Training
CITI Human Subjects	Before research	CITI Program
Responsible Conduct	Once	Graduate school requirement
Information Security	Annual	UNC ITS Training

Reproducibility

Every analysis must be reproducible:

Document package dependencies (DESCRIPTION file or README)
Set seeds for random processes
Document software versions
Include instructions to rerun

Publications

Authorship

Follow ICMJE guidelines:

Substantial contributions to conception/design or data acquisition/analysis
Drafting or critically revising the manuscript
Final approval of the version to be published
Agreement to be accountable for all aspects

Discuss authorship early and revisit as contributions evolve.

Writing Process

Outline: Share structure before writing
Drafts: Use tracked changes or Git branches
Review: At least one lab member reviews before submission
Preprints: Encouraged for most work (arXiv, bioRxiv)

Code Availability

Published papers should include:

GitHub repository with analysis code
Data availability statement
README with reproduction instructions

Professional Development

Conferences

Present work at least once per year
Submit abstracts early (discuss with Dr. Rashid)
Share travel funding opportunities

Training

Complete required compliance training
Attend relevant seminars and workshops
Share useful resources with the lab

Work-Life Balance

Core hours: Generally 10am-4pm for meetings
Remote work: Flexible, communicate availability
Vacation: Take it! Just coordinate coverage
Wellness: Your health comes first

Related Resources

Resource	Description
Lab Branding	Presentations, colors, logos for talks and papers
Lab Computing	Setup guides, coding standards, HPC workflows
Project Templates	Starters for research projects and papers

Getting Help

Technical issues: #computing Teams channel
Research questions: 1:1 or lab meeting
Administrative: Department staff
Personal concerns: Dr. Rashid (confidentially)