Lab Policies
Guidelines for research conduct and collaboration
Communication
Teams
Our primary communication tool. Key channels:
| Channel | Purpose |
|---|---|
#general |
Announcements, lab-wide discussions |
#computing |
Technical questions, Longleaf issues |
#papers |
Paper discussions, writing feedback |
#random |
Non-work chat |
Response expectations:
- Direct messages: Within 24 hours
- Channel mentions: Within 48 hours
- Urgent matters: Email or call
Meetings
- Lab meetings: Every two weeks (Thursdays 9:30am), attendance expected
- 1:1s with Dr. Rashid: Bi-weekly or as scheduled
- Working groups: As needed for projects
See the Lab Meetings page for comprehensive guidance on agendas, notes, action items, and the Meeting Schedule for the current rotation.
1:1 Meetings with Dr. Rashid
Bi-weekly individual meetings are for mentorship, progress review, and removing blockers.
Before your 1:1:
- Review your action items from the last meeting
- Prepare 2-3 bullet points on progress since last time
- Identify 1-2 specific questions or blockers to discuss
- Have your current work ready to show (code, figures, writing)
Typical structure (30 min):
- Your updates and questions (15 min) - What you’ve done, what’s blocking you
- Feedback and discussion (10 min) - Dr. Rashid’s input, suggestions, connections
- Action items (5 min) - Clear next steps with owners and deadlines
Tips:
- Come prepared—don’t wing it
- Be specific about blockers (“I’m stuck on X” not “things are hard”)
- Take notes on action items during the meeting
- It’s okay to say “I don’t know” or “I need help with…”
Code and Data
Version Control
All code must be version controlled with Git:
- Public repos on GitHub when possible
- Private repos for unpublished work
- No code in email attachments
Data Management
Data files—especially large ones—should never be committed to repositories. Use the lab project directory (/proj/rashidlab/) for data storage and symlinks for access.
For a step-by-step walkthrough of setting up data for a new project, see Your First Project — Set Up Data Directory.
Storage Principles
| Data Type | Location | Git Status |
|---|---|---|
| Raw data | /proj/rashidlab/<project>/data/raw/ |
Never commit |
| Processed data | /proj/rashidlab/<project>/data/processed/ |
Never commit |
| Small configs | config/ in repo |
Commit |
| Results | results/ in repo |
Usually gitignore |
Why Not in Repos?
- Size limits: GitHub has a 100MB file limit; data often exceeds this
- History bloat: Binary files in Git history cause repo bloat
- Sensitive data: PHI and identifiable data must never be in public repos
- Collaboration: Large files slow down
git clonefor collaborators
Recommended Workflow
# 1. Store data in lab project directory
cp dataset.csv /proj/rashidlab/my-project/data/raw/
# 2. Create symlink from your repo (optional, for convenience)
cd my-project
ln -s /proj/rashidlab/my-project/data data
# 3. Ensure data/ is in .gitignore
echo "data/" >> .gitignoreAfter Publication
When the paper is accepted:
- Public data: Upload to appropriate repository (GEO, Zenodo, figshare)
- Update README: Add data download instructions with URLs
- Include setup script:
scripts/download_data.shto automate retrieval
Example setup script:
#!/bin/bash
# scripts/download_data.sh - Download data after cloning repo
mkdir -p data/raw
# Download from GEO
wget -O data/raw/expression.csv "https://geo.example.com/GSE12345/data.csv"
# Download from Zenodo
wget -O data/raw/clinical.csv "https://zenodo.org/record/12345/files/clinical.csv"
echo "Data download complete. Run 'make all' to reproduce results."Gitignore Template
Your .gitignore should include:
# Data (never commit)
data/
*.csv
*.tsv
*.xlsx
*.rds
*.RData
# Large outputs
results/
_targets/
# Sensitive
*.pem
*.key
.env
Key Principles
- Raw data: Never modify, always preserve original
- Processed data: Document all transformations in code
- Sensitive data: Follow IRB protocols strictly — store in
/proj/rashidlab/projects/(not the publicly readable top-level directory) - Backups: The lab project directory (
/proj/rashidlab/) is not backed up — keep copies of irreplaceable data elsewhere - Privacy: The top-level
/proj/rashidlab/is publicly readable on the cluster; useprojects/orusers/subdirectories for anything private
Data Privacy & Security
Research data—especially clinical and genomic data—requires careful handling to protect patient privacy and comply with regulations.
Classification Levels
| Level | Examples | Handling |
|---|---|---|
| Public | Published results, code | Can be shared openly |
| Internal | Unpublished analyses | Lab members only |
| Confidential | De-identified clinical data | IRB-approved users only |
| Restricted (PHI) | Identifiable patient data | Strict access controls |
Protected Health Information (PHI)
PHI includes any data that could identify a patient: names, dates, medical record numbers, genomic sequences linked to identifiers, images, etc.
Requirements for PHI:
- IRB approval before any access or analysis
- HIPAA training completed annually (via UNC)
- Secure storage only—never on personal devices or cloud storage
- Access logging—document who accessed what and when
- De-identification before any sharing or publication
Secure Computing Practices
Do:
- Use UNC-approved systems (Longleaf) for sensitive data
- Enable two-factor authentication on all accounts
- Lock your workstation when stepping away
- Report any suspected breaches immediately
Never:
- Store PHI on personal laptops, Dropbox, Google Drive, or GitHub
- Share credentials or use shared accounts
- Email datasets containing identifiable information
- Leave printed materials with PHI unattended
De-identification
Before sharing data (even internally), remove or transform:
- Direct identifiers (names, MRNs, SSNs)
- Dates more specific than year
- Geographic data smaller than state
- Any unique characteristics
# Example: basic de-identification
deidentify <- function(data) {
data$patient_id <- seq_len(nrow(data)) # Replace with sequential IDs
data$mrn <- NULL # Remove MRN
data$dob <- NULL # Remove birthdate
data$age_group <- cut(data$age, breaks = c(0, 40, 60, 80, Inf),
labels = c("<40", "40-59", "60-79", "80+"))
data$age <- NULL # Replace exact age with range
return(data)
}Incident Response
If you suspect a data breach or security incident:
- Stop any ongoing data transfer immediately
- Document what happened (time, files involved, who was affected)
- Report to Dr. Rashid and UNC ITS Security within 24 hours
- Do not attempt to “fix” or hide the issue
Training Requirements
| Training | Frequency | Link |
|---|---|---|
| HIPAA Privacy | Annual | UNC HIPAA Training |
| CITI Human Subjects | Before research | CITI Program |
| Responsible Conduct | Once | Graduate school requirement |
| Information Security | Annual | UNC ITS Training |
Reproducibility
Every analysis must be reproducible:
- Document package dependencies (DESCRIPTION file or README)
- Set seeds for random processes
- Document software versions
- Include instructions to rerun
Publications
Writing Process
- Outline: Share structure before writing
- Drafts: Use tracked changes or Git branches
- Review: At least one lab member reviews before submission
- Preprints: Encouraged for most work (arXiv, bioRxiv)
Code Availability
Published papers should include:
- GitHub repository with analysis code
- Data availability statement
- README with reproduction instructions
Professional Development
Conferences
- Present work at least once per year
- Submit abstracts early (discuss with Dr. Rashid)
- Share travel funding opportunities
Training
- Complete required compliance training
- Attend relevant seminars and workshops
- Share useful resources with the lab
Work-Life Balance
- Core hours: Generally 10am-4pm for meetings
- Remote work: Flexible, communicate availability
- Vacation: Take it! Just coordinate coverage
- Wellness: Your health comes first
Getting Help
- Technical issues:
#computingTeams channel - Research questions: 1:1 or lab meeting
- Administrative: Department staff
- Personal concerns: Dr. Rashid (confidentially)