How to write a data management and sharing plan for NIH grants

The new NIH Data Management and Sharing Policy has researchers scrambling. Since January 2023, every NIH proposal needs a data management and sharing plan (DMSP). Not optional. Not a brief paragraph tucked into your methods. A separate, required document that reviewers actually read.

Most researchers treat this like a compliance checkbox. That's a mistake. A thoughtful DMSP can strengthen your application by demonstrating feasibility and impact. A sloppy one signals that you haven't thought through your project's logistics.

You'll learn how to write a DMSP that satisfies NIH requirements while supporting your research goals. No generic templates. Real examples with the kind of specificity that convinces reviewers you know what you're doing.

Here's a realistic DMSP for a hypothetical study on sleep patterns in healthcare workers, with inline commentary explaining each section:

Scientific data to be preserved and shared

We will collect polysomnography data from 200 healthcare workers across three hospital systems over 24 months. Primary datasets include sleep stage scoring (EDF format), actigraphy recordings (CSV format), and validated questionnaire responses (REDCap exports). Raw polysomnography files average 50MB per participant per night; processed sleep metrics are approximately 2MB per participant.

Secondary datasets include work schedules, shift rotation patterns, and de-identified demographic information. We estimate total data volume at 15GB for raw files and 2GB for processed datasets.

Why this works: Specific file formats, realistic size estimates, and clear distinction between raw and processed data. Reviewers can picture exactly what you're generating.

Data collection uses Compumedics Profusion PSG software (proprietary) and ActiGraph GT3X+ devices with ActiLife software. Sleep scoring follows AASM criteria using custom R scripts developed in our laboratory.

Analysis code will be written in R (version 4.3+) using tidyverse, signal processing, and mixed-effects modeling packages. All custom scripts for data processing and statistical analysis will be documented and shared via GitHub under MIT license.

Why this works: Names specific software versions and acknowledges what's proprietary versus shareable. The GitHub commitment shows you understand modern reproducibility practices.

Standards to be applied

Polysomnography data will follow Brain Imaging Data Structure (BIDS) formatting standards where applicable. Sleep metrics will use standardized terminology from the AASM Manual for Scoring Sleep. Questionnaire data follows instrument-specific scoring guidelines (Pittsburgh Sleep Quality Index, Epworth Sleepiness Scale).

All datasets will include comprehensive metadata documenting collection protocols, scoring criteria, and variable definitions using Dublin Core metadata standards.

Why this works: Cites actual standards that exist in sleep research. Shows you've researched best practices rather than making something up.

Data preservation, access, and timelines

Data will be deposited in the National Sleep Research Resource (NSRR) within 12 months of study completion. NSRR specializes in sleep research data and provides long-term preservation with DOI assignment.

De-identified datasets will be available to qualified researchers through NSRR's data access process, which includes institutional review and data use agreements. Raw polysomnography data requires additional IRB approval due to potential re-identification risks.

We will retain local copies on university servers with automated backups for the duration of the award plus seven years, consistent with NIH record retention policies.

Why this works: Names a real repository appropriate for this data type. Acknowledges different access levels for different data sensitivity. Shows understanding of long-term responsibilities.

Access, distribution, and reuse considerations

Healthcare worker schedules and hospital-specific information cannot be shared due to competitive sensitivity and potential participant re-identification. We will provide summary statistics and aggregated shift pattern categories instead.

Individual-level sleep data requires institutional approval through NSRR's established process. This typically takes 2-4 weeks and ensures appropriate use while protecting participant privacy.

All shared data will include detailed documentation enabling replication of our analyses. We encourage secondary analyses and will respond to reasonable requests for additional information within 30 days.

Why this works: Honest about what can't be shared and why. Provides realistic timelines for access requests. Shows willingness to engage with other researchers.

The Principal Investigator will oversee all aspects of data management. Dr. Sarah Chen (Co-I) will serve as data manager, with specific responsibility for quality control, de-identification, and repository submission.

Our university's Research Data Services office will provide technical support for data formatting and repository submission. The study's Data Safety Monitoring Board will review data sharing protocols annually.

Why this works: Names specific people with specific responsibilities. Shows you've thought about institutional resources and ongoing oversight.

Top tips for success

Match your repository to your data type. Generic repositories like Figshare work for simple datasets, but specialized repositories often serve your field better. Brain imaging goes to OpenNeuro. Genomics goes to dbGaP. Sleep data goes to NSRR. These repositories understand your data's unique requirements and connect you with relevant researchers.
Be realistic about what you can't share. Patient data with re-identification risks, proprietary industry information, and data from vulnerable populations all have legitimate sharing restrictions. Explain these clearly rather than promising everything. Reviewers appreciate honesty over unrealistic commitments.
Think beyond compliance. A good DMSP demonstrates project feasibility, shows you understand your field's data practices, and can actually strengthen collaborations. Frame data sharing as enabling science, not just satisfying requirements.

Common mistakes to avoid

The generic template trap. Copying boilerplate language about "making data available upon reasonable request" tells reviewers you haven't thought about this seriously. NIH wants specific repositories, realistic timelines, and clear access procedures. Show your work.
Underestimating costs and effort. Data management isn't free. De-identification takes time. Repository submission requires formatting. Documentation needs writing. If you don't budget for these activities, they won't happen properly. Include realistic effort estimates in your project timeline.
Ignoring your IRB early. Many researchers discover too late that their consent forms don't permit the data sharing they've promised NIH. Talk to your IRB during protocol development, not after data collection. Some sharing restrictions require specific consent language from the start.

The bottom line

A strong DMSP demonstrates scientific rigor and practical feasibility. It shows reviewers you've thought through your project's lifecycle beyond data collection. Treat it as part of your research plan, not an administrative afterthought.

Start with your field's existing repositories and standards. Be specific about formats, timelines, and responsibilities. Acknowledge legitimate restrictions honestly. Budget time and resources for proper data management.

The goal isn't perfect data sharing—it's responsible data sharing that advances science while protecting participants and respecting practical constraints. Get that balance right, and your DMSP becomes a strength rather than a burden.

CarbonDraft can help generate first drafts of data management plans from your project notes and requirements. You provide the context about your study design and data types, it handles the formatting and structure.

How to write a data management and sharing plan for NIH grants

Scientific data to be preserved and shared

Standards to be applied

Data preservation, access, and timelines

Access, distribution, and reuse considerations

Top tips for success

Common mistakes to avoid

The bottom line

Want to improve your scientific writing?

Share this post

Example data management and sharing plan with commentary

Scientific data to be preserved and shared

Related tools, software, and code

Standards to be applied

Data preservation, access, and timelines

Access, distribution, and reuse considerations

Oversight of data management and sharing

Top tips for success

Common mistakes to avoid

The bottom line

Want to improve your scientific writing?

Share this post