Since the origin of the human reference in the completion of the International Human Genome Project, there has been a need to maintain and improve the human reference and to make it available to the community. This has included resolving error reports, adding information to the reference from new high-quality genomes as they became available, and developing ways to represent alternative haplotype information derived from them. Improved or updated reference versions are curated and released to the community.
On March 1, 2018, NHGRI convened a web meeting of over 65 basic research, clinical, and bioinformatic scientists to discuss scientific opportunities for the genome reference. The meeting addressed key research and resource opportunities for improving the human reference; activities necessary to keep the reference relevant and useful; clinical and research community needs (including education); related resources; and collaborations.
The high-level conclusion of the meeting was that the current version of the human reference does not adequately represent human haplotype variation, that the existing tools to include alternative haplotype information in analyses are not well-used, and that there is an opportunity to significantly improve the human reference by developing it into a “pan-genome”. One goal of a pan-genome reference is to represent as much as possible of human haplotype variation, implying that any newly sequenced experimental or patient haplotype will be readily alignable to the reference. This would include the multiple types of human genomic variation phased in chromosomal regions. This would require addition of many more high-quality human genome assemblies chosen to maximize haplotype diversity, for instance by incorporating samples collected under 1000 Genomes . This would also require the adoption of better ways of representing the data (e.g., as a genome graph), along with the development of new informatics tools to make use of the new reference.
As a result of these discussions, NHGRI will re-organize and re-focus its contribution to the genome reference to create a multi-component Human Genome Reference Program (HGRP) intended to enable an improved human genome reference for the community, and to foster its long-term sustainability and improvement.
Based on the Concept for this program presented to the National Council on Human Genome Research the components will be:
- A Human Genome Reference Center (HGRC; RFA-HG-19-004)
- High Quality Human Reference Genomes (HGRQ; RFA-HG-19-002)
- Genome Reference Representations (GRR; RFA-HG-19-003)
- Informatics tools for use of the human genome reference (see Concept documents)
- Technology development for complete sequencing of genomes (NOT-HG-19-011)