NHGRI logo

Human Genome Reference Program

The human genome reference is used by essentially all researchers who need to align and assemble experimental or patient genome sequence data. It also serves as a consensus coordinate system for reporting results.

Overview

Since the origin of the human reference in the completion of the International Human Genome Project, there has been a need to maintain and improve the human reference and to make it available to the community. This has included resolving error reports, adding information to the reference from new high-quality genomes as they became available, and developing ways to represent alternative haplotype information derived from them. Improved or updated reference versions are curated and released to the community.

On March 1, 2018, NHGRI convened a web meeting of over 65 basic research, clinical, and bioinformatic scientists to discuss scientific opportunities for the genome reference. The meeting addressed key research and resource opportunities for improving the human reference; activities necessary to keep the reference relevant and useful; clinical and research community needs (including education); related resources; and collaborations.

The high-level conclusion of the meeting was that the current version of the human reference does not adequately represent human haplotype variation, that the existing tools to include alternative haplotype information in analyses are not well-used, and that there is an opportunity to significantly improve the human reference by developing it into a “pan-genome”. One goal of a pan-genome reference is to represent as much as possible of human haplotype variation, implying that any newly sequenced experimental or patient haplotype will be readily alignable to the reference.  This would include the multiple types of human genomic variation phased in chromosomal regions. This would require addition of many more high-quality human genome assemblies chosen to maximize haplotype diversity, for instance by incorporating samples collected under 1000 Genomes . This would also require the adoption of better ways of representing the data (e.g., as a genome graph), along with the development of new informatics tools to make use of the new reference. 

As a result of these discussions, NHGRI will re-organize and re-focus its contribution to the genome reference to create a multi-component Human Genome Reference Program (HGRP) intended to enable an improved human genome reference for the community, and to foster its long-term sustainability and improvement.

Based on the Concept for this program presented to the National Council on Human Genome Research the components will be:

  1. A Human Genome Reference Center (HGRC; RFA-HG-19-004)
  2. High Quality Human Reference Genomes (HGRQ; RFA-HG-19-002)
  3. Genome Reference Representations (GRR; RFA-HG-19-003)
  4. Informatics tools for use of the human genome reference (see Concept documents)
  5. Technology development for complete sequencing of genomes (NOT-HG-19-011

Participants and Structure

Human Genome Reference Center

  • Washington University, St. Louis
    Principal Investigators (PI): Ting Wang (Contact), Paul Flicek, Ira Hall, Benedict Paten

    The Human Genome Reference Center at Washington University in St. Louis serves as the coordinating center. They maintain and update the reference sequence; support state-of-the-art reference representations; and educate and coordinate with the research community (including clinicians and basic research scientists).
     

High Quality Reference Genomes

  • University of California, Santa Cruz}
    Principal Investigators (PI): David Haussler (Contact), Evan Eichler, Ira Hall

    The High-Quality Human Reference Genomes Center at the University of California, Santa Cruz collects additional DNA samples from populations not represented in the current reference, including the creation of cell lines. They will generate at least 350 high-quality reference genome sequences, a subset of which will be finished, telomere-to-telomere genome sequences. The center also disseminates the data and works closely with the other Human Genome Reference Program components.
     

Genome Reference Representations

  • Dana-Farber Cancer Institute
    Principal Investigators (PI): Heng Li (Contact), Benedict Paten
    Project Title: The construction and utility of reference pan-genome graphs
     
  • University of Southern California
    Principal Investigators (PI): Mark Chaisson (Contact), Evan Eichler, Tobias Marschall
    Project Title: Representing structural haplotypes and complex genetic variation in pan-genome graphs
     
  • Stanford University
    Principal Investigators (PI): Hanlee Ji (Contact), Tsachy Weissman
    Project Title: K-mer indexing for pan-genome reference annotation
     

The Genome Reference Representations (GRR) projects support research and development for a next-generation genome reference representation that can capture all human genome variation and support research on the full diversity of populations.

Informatics Tools for the Pangenome

  • Pending
     
  • Purpose: To develop informatics tools that can apply the new pangenome representation for analysis and enable use of the high-quality genome reference by clinical and basic researchers.
     

Technology Development for Complete Genome Sequencing

  • NHGRI will accept applications for Technology Development for Complete Genome Sequencing on an ongoing basis (see NOT-HG-19-011)
     
  • Purpose: Develop technologies for complete de novo sequencing of phased diploid human genomes.

Program Management

NHGRI manages the HGRP as a consortium. Grantees for the Human Genome Reference Center, High Quality Reference Genomes, and Genome Reference Representations components interact closely on several aspects of the program such as prioritizing new samples, resolving reference errors or ambiguities, establishing quality metrics, transitioning to graph representations or new reference “builds”, and others.

NHGRI believes that the human reference will be more broadly useful if it can be integrated with, or is part of an effective ecosystem with, other existing databases and resources that present human variation information in different contexts (i.e. ClinVar, EGA, Human Genome Structural Variation Consortium, gnomAD, Bravo, etc.)

Data Release and Access Policies

NHGRI data release policies for genome sequence data evolved from the original Bermuda and Ft. Lauderdale policies which were suited for the Human Genome Project data and organismal sequence data. With the advent of projects involving large numbers of samples from human subjects, this area is under continuous evaluation, much of it at the NIH, rather than the NHGRI level.

See: NOT-OD-13-119 for a discussion of the latest NIH policy proposals in this area.

Select Working Groups

Working GroupChairsRole
Assembly TeamEvan Eichler
Karen Miga
Ira Hall
Benedict Paten
Erich Jarvis
Kerstin Howe
Generate “high quality production grade” assemblies; generate “finished” T2T assemblies; QC and validate assemblies; develop methods and pipelines
PangenomesIra Hall
Benedict Paten
Heng Li
Variant calling; pangenome framework, construction, and tools
Resource Improvement and MaintenancePaul Flicek
Valerie Schneider
Tina Lindsay
Functional annotation; handling error reports; resolving errors through targeted re-assembly and/or sequencing
Resource Sharing and OutreachTing Wang
David Haussler
Resource sharing; outreach & education; browsers
SamplesEimear Kenny
Karen Miga
Collect, identify, and prioritize samples for inclusion in the project
Technology and ProductionBob Fulton
Karen Miga
Coordinate data production across sites; develop, optimize, troubleshoot, and share protocols; engage with technology companies; test and adopt new technologies and protocols

 

Funding Opportunities

Active 

  • PAR-25-308 Small Business Informatics Tools for the Pangenome (R43 Clinical Trial Not Allowed)
    Next Application Due Date: March 3rd, 2025
    Expiration Date: March 4th, 2025
  • PAR-25-309 Small Business Informatics Tools for the Pangenome (R41 Clinical Trial Not Allowed) 
    Next Application Due Date: March 3rd, 2025
    Expiration Date: March 4th, 2025
  • RFA-HG-23-026 Informatics Tools for the Pangenome (U01 Clinical Trial Not Allowed)
    Next Application Due Date: March 3rd, 2025
    Expiration Date: March 4th, 2025

Expired

  • RFA-HG-23-024 Limited Competition: High Quality Reference Genomes (UM1 Clinical Trial Not Allowed)
    Expiration Date: August 16, 2023
  • RFA-HG-23-025 Limited Competition: Human Pangenome Coordinating Center (U41 Clinical Trial Not Allowed)
    Expiration Date: August 16, 2023
  • RFA-HG-19-004 Human Genome Reference Center (HGRC) (U41 Clinical Trial Not Allowed)
    Expiration Date: Apr 03, 2019
  • RFA-HG-19-002 High Quality Human Reference Genomes (HQRG) (U01 Clinical Trial Not Allowed) 
    Expiration Date: Apr 03, 2019
  • RFA-HG-19-003 Research and Development for Genome Reference Representations (GRR) (U01 Clinical Trial Not Allowed) 
    Expiration Date: Apr 03, 2019
  • NOT-HG-19-011 Notice of Change: Emphasizing Opportunity for Developing Comprehensive Human Genome Sequencing Methodologies in Response to NHGRI Novel Nucleic Acid Sequencing Technology Development FOAs

Contact Staff

Program Directors

Adam Felsenfeld, Ph.D.
Adam Felsenfeld, Ph.D.
  • Program Director
  • Extramural Programs Branch
Alexander Arguello
Alexander Arguello, Ph.D.
  • Program Director
  • Division of Genome Sciences
Idan Gabdank
Idan Gabdank, Ph.D.
  • Program Director
  • Division of Genome Sciences

Scientific Program Analysts

Gabby Villard
Gabrielle Villard, B.S.
  • Scientific Program Analyst
  • Division of Genome Sciences

Last updated: December 19, 2024