NHGRI logo

Human Genome Reference Program (HGRP)

The human genome reference is used by essentially all researchers who need to align and assemble experimental or patient genome sequence data. It also serves as a consensus coordinate system for reporting results.

Overview

The human genome reference is used by essentially all researchers who need to align and assemble experimental or patient genome sequence data. It also serves as a consensus coordinate system for reporting results. The genome reference is therefore a critical resource for the genomics community. The NHGRI Human Genome Reference Program (HGRP) provides funding for efforts to maintain and improve the human genome reference resource.

Recently it has become technically feasible to produce very high-quality (haplotype phased, nearly contiguous) genome assemblies at scale. This has led to the implementation of a “pangenome reference”, which will include genome assemblies from hundreds of individuals from genetically diverse worldwide populations, along with the computational tools to enable the scientific community to use it.  

A pangenome reference ideally will faithfully represent human haplotype variation that exists worldwide and whose frequencies vary across populations—variants that would not readily be detected using a genome reference only including one or a few individuals.  With a fully realized pangenome reference, any newly sequenced experimental or patient haplotype will be readily alignable to the reference.  This will make the human genome reference much more useful for essentially all genome sequence analyses done worldwide, and less likely to result in differences across populations in the ability to detect variation, which could lead to health disparities. For more on the human pangenome, see “The Human Pangenome”. 

Phase 1 of the pangenome reference resource effort is now complete, starting with the release of a pangenome resource including genomes from nearly 50 individuals in 2023.  As of December 2024, data and assemblies have been acquired for another ~300 individuals. These will be incorporated into the pangenome reference resource and released to the public in stages through mid-2025.  

Phase 2 of NHGRI funding for the pangenome reference resource began in December 2024. The effort will add assemblies from another ~200 individuals selected for the likelihood that their genome assemblies will contribute additional variation to the resource. Phase 2 will also emphasize outreach and community adoption and will include development of informatics tools that will enable the wider community to use the pangenome reference resource to improve their research.  
 

Organization and Components

For Phase 2 of the HGRP three components were funded. They are collectively called the Human Pangenome Reference Consortium (HPRC).  These include:

1.    A Human Genome Reference Center (HGRC; RFA-HG-23-025)
2.    High Quality Human Reference Genomes (HGRQ; RFA-HG-23-024)
3.    Informatics tools for use of the human genome reference (ITPG; RFA-HG-25-007)

Phase 1 of the project also included efforts to develop computational representations of the pangenome; and an effort to continue to develop technology for sequencing complete high-quality genomes. See Table below.

NHGRI will also fund separate SBIRs for pangenome Informatics Tools —see PAR-25-308, PAR-25-309.

 

Ethical, Legal, and Social Implications

The Genomes Center award also supports a team of researchers dedicated to the ethical, legal, and social implications (ELSI) of the human pangenome reference. The team of ELSI researchers is embedded within the larger HPRC project and charged with identifying and addressing both known and emerging ELSI issues related to the HPRC using a variety of research methodologies. 
 

Collaborations and Outreach

The HPRC interacts across borders and at multiple levels:

  • The HPRC is an international effort, including investigators in the US and Europe, and interactions with labs in Australia, Italy, and Japan 
  • The HPRC is a GA4GH Driver Project  
  • The HPRC is a member of the Human Pangenome Project

Current information is available at humanpangenome.org. 

Resource Availability

Sequencing Data

The HGRP consortium is generating sequencing data utilizing a range of sequencing platforms (i.e., Illumina, Oxford Nanopore Technologies and Pacific Biosciences). The sequencing data includes short-read genome sequence data and chromatin conformation data; long-read genome sequence data, methylation data and transcription data. Post quality assessment it is deposited at AWS S3 bucket (https://registry.opendata.aws/hpgp-data/) and mirrored at AnVIL (https://anvilproject.org/data/consortia/HPRC). In addition to that it is also made available at SRA, ENA and DDBJ. 

Genome Assemblies and Annotations

Sequencing data is used to generate high quality diploid genome assemblies that are deposited at AWS S3 bucket (https://registry.opendata.aws/hpgp-data/), mirrored at AnVIL (https://anvilproject.org/data/consortia/HPRC) and deposited at GenBank. 

The following annotations accompany the assemblies:

  • Gene annotations from Comparative Annotation Toolkit (CAT) and Ensembl
  • Segmental Duplications
  • Tandem Repeats
  • Transposable Elements

Reference Pangenome Graphs

High quality diploid assemblies deposited at GenBank are used to derive reference pangenome graphs utilizing the following approaches:

  • Minigraph
  • Minigraph-CACTUS
  • Pangenome Graph Builder

All the raw and processed data generated by the consortium are publicly available after quality assessment. Computational workflows and quality assessment pipelines are available through Dockstore.

  • Resource Availability

    Sequencing Data

    The HGRP consortium is generating sequencing data utilizing a range of sequencing platforms (i.e., Illumina, Oxford Nanopore Technologies and Pacific Biosciences). The sequencing data includes short-read genome sequence data and chromatin conformation data; long-read genome sequence data, methylation data and transcription data. Post quality assessment it is deposited at AWS S3 bucket (https://registry.opendata.aws/hpgp-data/) and mirrored at AnVIL (https://anvilproject.org/data/consortia/HPRC). In addition to that it is also made available at SRA, ENA and DDBJ. 

    Genome Assemblies and Annotations

    Sequencing data is used to generate high quality diploid genome assemblies that are deposited at AWS S3 bucket (https://registry.opendata.aws/hpgp-data/), mirrored at AnVIL (https://anvilproject.org/data/consortia/HPRC) and deposited at GenBank. 

    The following annotations accompany the assemblies:

    • Gene annotations from Comparative Annotation Toolkit (CAT) and Ensembl
    • Segmental Duplications
    • Tandem Repeats
    • Transposable Elements

    Reference Pangenome Graphs

    High quality diploid assemblies deposited at GenBank are used to derive reference pangenome graphs utilizing the following approaches:

    • Minigraph
    • Minigraph-CACTUS
    • Pangenome Graph Builder

    All the raw and processed data generated by the consortium are publicly available after quality assessment. Computational workflows and quality assessment pipelines are available through Dockstore.

Participants

AwardeeInstitutionTitleAward NumberStatus
Coordinating Center Award
Ting Wang, Heng Li, Benedict Paten, Fergal Martin, Ira Hall
 
Washington UniversityThe Human Pangenome Reference Consortium Coordination CenterU41HG010972Active
Genome Center Award
Karen Miga, Eimear Kenny, Ting Wang, Erich Jarvis, Robert Cook-Deegan, Evan Eichler
 
University of California, Santa CruzCenter for Human Genome Reference DiversityUM1HG010971Active
Tool Development Awards
Erik GarrisonUniversity of Tennessee Health Science CenterBuilding Tools and Community to Make Pangenomes AccessibleU01HG013760Active
Melissa GymrekUniversity of California, San DiegoIntegrating the reference pangenome with biobank-scale data for complex trait analysisU01HG013755Active
Benedict Paten, Heng Li, Tobias Marschall
 
University of California, Santa CruzTools for comprehensive variant characterization using the pangenomeU01HG013748Active
Andrew StergachisUniversity of WashingtonTooling for accurately studying the epigenome along the human pangenome referenceU01HG013744Active
Reference Representation Awards
Heng Li, Benedict Paten
 
Dana-Farber Cancer InstituteThe construction and utility of reference pan-genome graphsU01HG010961Expired
Mark Chaisson, Evan Eichler, Tobias Marschall
 
University of Southern CaliforniaRepresenting structural haplotypes and complex genetic variation in pan-genome graphsU01HG010973Expired
Hanlee Ji, Tsachy Weissman
 
Stanford UniversityK-mer indexing for pan-genome reference annotationU01HG010963Expired
Technology Development Awards
Karen MigaUniversity of California, Santa CruzImproving throughput of long reads with high consensus base accuracy to resolve repetitive DNAsR21HG010548Expired
  • Participants
    AwardeeInstitutionTitleAward NumberStatus
    Coordinating Center Award
    Ting Wang, Heng Li, Benedict Paten, Fergal Martin, Ira Hall
     
    Washington UniversityThe Human Pangenome Reference Consortium Coordination CenterU41HG010972Active
    Genome Center Award
    Karen Miga, Eimear Kenny, Ting Wang, Erich Jarvis, Robert Cook-Deegan, Evan Eichler
     
    University of California, Santa CruzCenter for Human Genome Reference DiversityUM1HG010971Active
    Tool Development Awards
    Erik GarrisonUniversity of Tennessee Health Science CenterBuilding Tools and Community to Make Pangenomes AccessibleU01HG013760Active
    Melissa GymrekUniversity of California, San DiegoIntegrating the reference pangenome with biobank-scale data for complex trait analysisU01HG013755Active
    Benedict Paten, Heng Li, Tobias Marschall
     
    University of California, Santa CruzTools for comprehensive variant characterization using the pangenomeU01HG013748Active
    Andrew StergachisUniversity of WashingtonTooling for accurately studying the epigenome along the human pangenome referenceU01HG013744Active
    Reference Representation Awards
    Heng Li, Benedict Paten
     
    Dana-Farber Cancer InstituteThe construction and utility of reference pan-genome graphsU01HG010961Expired
    Mark Chaisson, Evan Eichler, Tobias Marschall
     
    University of Southern CaliforniaRepresenting structural haplotypes and complex genetic variation in pan-genome graphsU01HG010973Expired
    Hanlee Ji, Tsachy Weissman
     
    Stanford UniversityK-mer indexing for pan-genome reference annotationU01HG010963Expired
    Technology Development Awards
    Karen MigaUniversity of California, Santa CruzImproving throughput of long reads with high consensus base accuracy to resolve repetitive DNAsR21HG010548Expired

Funding Opportunities

Active 

  • PAR-25-308 Small Business Informatics Tools for the Pangenome (R43 Clinical Trial Not Allowed)
    Next Application Due Date: March 3rd, 2025
    Expiration Date: March 4th, 2025
  • PAR-25-309 Small Business Informatics Tools for the Pangenome (R41 Clinical Trial Not Allowed) 
    Next Application Due Date: March 3rd, 2025
    Expiration Date: March 4th, 2025
  • RFA-HG-25-007 Informatics Tools for the Pangenome (U01 Clinical Trial Not Allowed)
    Next Application Due Date: March 3rd, 2025
    Expiration Date: March 4th, 2025

Expired

  • RFA-HG-23-024 Limited Competition: High Quality Reference Genomes (UM1 Clinical Trial Not Allowed)
    Expiration Date: August 16, 2023
  • RFA-HG-23-025 Limited Competition: Human Pangenome Coordinating Center (U41 Clinical Trial Not Allowed)
    Expiration Date: August 16, 2023
  • RFA-HG-19-004 Human Genome Reference Center (HGRC) (U41 Clinical Trial Not Allowed)
    Expiration Date: Apr 03, 2019
  • RFA-HG-19-002 High Quality Human Reference Genomes (HQRG) (U01 Clinical Trial Not Allowed) 
    Expiration Date: Apr 03, 2019
  • RFA-HG-19-003 Research and Development for Genome Reference Representations (GRR) (U01 Clinical Trial Not Allowed) 
    Expiration Date: Apr 03, 2019
  • NOT-HG-19-011 Notice of Change: Emphasizing Opportunity for Developing Comprehensive Human Genome Sequencing Methodologies in Response to NHGRI Novel Nucleic Acid Sequencing Technology Development FOAs

Contact Staff

Program Directors

Adam Felsenfeld, Ph.D.
Adam Felsenfeld, Ph.D.
  • Program Director
  • Extramural Programs Branch
Alexander Arguello
Alexander Arguello, Ph.D.
  • Program Director
  • Division of Genome Sciences
Idan Gabdank
Idan Gabdank, Ph.D.
  • Program Director
  • Division of Genome Sciences
Jyoti Dayal
Jyoti G. Dayal, M.S.
  • Program Director
  • Division of Genomic Medicine
Nicole C. Lockhart, Ph.D.
Nicole C. Lockhart, Ph.D.
  • Program Director
  • ELSI Research Branch

Scientific Program Analysts

Gabby Villard
Gabrielle Villard, B.S.
  • Scientific Program Analyst
  • Division of Genome Sciences

Last updated: February 14, 2025