The 10-year anniversary of the Human Genome Project: commemorating and reflecting

April 30, 2013

On April 14, 2003, the National Human Genome Research Institute (NHGRI) and our international partners announced the completion of the Human Genome Project (HGP) and the successful generation of a highly accurate and publically available reference sequence of the human genome. Those ordered ~3 billion letters provided the most fundamental knowledge about the human genetic blueprint and gave us a framework of knowledge for pursuing numerous new and exciting genomic studies.

At this 'genomic odometer moment, it is worth reflecting on the decade since the end of the HGP and to consider where the field of genomics is going, especially as it relates to medical applications and advances. Advances in genome sequencing technologies are an appropriate place to start.

Genome Sequencing

When the HGP started, none of us involved in the project knew how we were actually going to sequence the human genome. Even then, our Institute (then actually a Center) invested in the development of technologies to improve our ability to map and to sequence genomes. While great improvements were made in the methods used for sequencing DNA, it still took us 6 to 8 years of active sequencing and cost roughly $1 billion to actually complete that first human genome sequence as part of the HGP. While certainly worth the money and the wait, we understood that the cost of genome sequencing needed to be reduced substantially for the field of genomics to advance in the myriad desired ways.

By the end of the HGP, we were already getting better at genome sequencing. In fact, in 2003, if the researchers that had just finished the first reference human genome sequence had immediately sequenced a second human genome, it would have taken three to four months and cost $30 to 50 million. But even that was too slow and too expensive.

Fast forward to today - 10 years later - and DNA sequencing technologies have advanced tremendously. Their present-day availability is the product of what arguably has been the most impressive technology development effort in the history of biomedical research. Now, a human genome can be sequenced in a day or two and at cost well below $10,000, closer to $4,000 to $5,000. So, in 10 short years, we have knocked five zeros from that initial $1 billion price tag - and we will likely knock that last zero off within a year or two or three.

Today, sequencing a human genome costs roughly the same as a sophisticated medical test like an MRI, and genome sequencing can, in principle, be affordably added to the medical testing repertoire. That said, we still need to establish the appropriate settings for clinical genome sequencing and what the resulting information will mean for patients. Towards that end, NHGRI and other research organizations are heavily involved in efforts to understand what information is encoded in the human genome and how it is relevant for health and disease.

Comparative Genomics

To establish how the human genome works, genomics researchers have focused on sequencing the genomes of a wide range other organisms, starting before the HGP ended. This included animals used for research and ones at key places on the evolutionary tree of life. Genome sequencing was performed for laboratory animals (such as mice, rats, worms, and fruit flies), companion animals (such as dogs and cats), agricultural animals (such as cows and pigs), and weirdoes along the way (such as the possum and the platypus). Reflecting back, we had generated genome sequences for three vertebrates in 2003; today, more genome sequences have been generated for 112 vertebrates, along with 455 non-vertebrate eukaryotes and nearly 9,000 prokaryotes (bacteria), mostly pathogens. We have nicely generated genome sequence data from all branches of the evolutionary tree.

By comparing those genome sequences, we are able to essentially read "evolution's notebook" and see what parts of our genome Nature thought was important and decided to keep the same in mice and rats and dogs and other mammals. Once we knew their location in the genome, those bits of evolutionarily conserved sequences proved to be a bit surprising. For example, only about one-third of the most highly conserved sequences in our genome code for protein, constituting ~1.5 percent of our ~3 billion bases and together making up ~20,000 genes. When the HGP ended, we thought our gene inventory was much larger. The remaining bits of highly conserved sequences do not encode protein (and in fact outnumber the bases that do encode protein by more than two-fold); in aggregate, these reflect the non-coding functional part of our genome. Figuring out what those non-coding functional sequences are doing has been and will be a high priority for basic researchers.

Understanding Genome Function

To understand how the human genome works, researchers set out to identify the functional parts. As one example, NHGRI launched the ENCODE (Encyclopedia of DNA Elements) Project to begin cataloging all functional elements in the human genome. We now know that the human genome is a "beehive" of activity, coding for thousands of RNAs that do not seem to then code for protein, but that seem to have other biological functions. The ENCODE data point to suggestive evidence for "biological activity" for as much as 80 percent of the human genome; however, that 80 percent claim is a controversial one, and researchers will argue it out in the coming years, as additional ENCODE and non-ENCODE data get generated and added to public databases.

We know a tremendous amount more in terms of the functional parts of the human genome than we did 10 years ago. But there is so much more to learn and understand with respect to the complexities of the biological information encoded in our genomes. Elucidating those complexities remains a high priority for genomics.

Human Genomic Variation and Human Genetic Disease

Even before the HGP was completed, researchers realized that humans were very similar at the genomic level - 99.9 percent identical base-by-base across the genome. But in that one-tenth of 1 percent difference resides the basis for inherited susceptibility to numerous diseases. NHGRI helped to organize several international projects - the SNP Consortium, the International HapMap Project, and the 1000 Genomes Project - to catalog and characterize common human genomic variation.

The resulting catalogs of human genomic variants are astonishing in their depth and breadth. When the HGP ended, we knew about 3.4 million SNPs (single-nucleotide polymorphisms; i.e., letters in the genome that some people had and that varied from the reference sequence at a particular base position). Ten years later, researchers have cataloged nearly 54 million SNPs, a 16-fold increase.

Why has this proven valuable? Companies have used those variant catalogs to make genotyping chips, paving the way for new approaches to study the genetic basis of disease, especially genetically complex diseases in which multiple variants contribute to the risk for the disorder. This resulted in the emergence of genome-wide association studies (GWAS), where researchers compare a large number of individuals with a disease - say diabetes- to individuals without the disease.

Ten years ago, there was skepticism about whether GWAS approaches would work. At that time, there were no successful GWAS examples and zero publications reporting GWAS findings. Today, there have been over 1,400 publications reporting successful GWAS projects.

Even more impressive progress has occurred for rare diseases, caused by defects in a single gene. Before the HGP began, we knew the genetic basis for about 60 of the thousands and thousands of rare genetic diseases. The HGP energized efforts to find the genes underlying these rare disorders, such that when the HGP ended 10 years ago, that number was up to over 2,200. Today, we now know the genomic basis for close to 5,000 rare disorders. That represents substantial progress, especially considering that rare diseases afflict more than 25 million Americans.

Medical Applications of Genomics

We have not cured all (or even many) diseases since the HGP ended - no one seriously thought that would happen. But we can now say that genomics is beginning to have a meaningful impact on medicine.

For example, today, the U.S. Food and Drug Administration (FDA) requires pharmacogenomic information to appear on the labels of 106 medications that are currently on the market. That means the label tells prescribing doctors that there is genomic information of some sort that might be considered before giving that drug to a particular patient. Before the HGP started, only four drugs carried such a label.

Perhaps the earliest medical advances due to genomics research will be in the diagnosis and treatment of cancer. In fact, this is already happening. Cancer is a genomic disease, and The Cancer Genome Atlas, a partnership between the National Cancer Institute and NHGRI, has systematically sequenced the genomes of thousands of tumors from patients with more than 20 different types of cancer, including brain, lung, colon, and breast cancer. As a result, oncologists are linking particular genomic profiles for specific cancer types with outcomes and determining treatments based on genomic information. Cancer patients will be among the early clinical beneficiaries of genomics.

We are clearly not done, and we have a long way to go to deliver on the medical promises of genomics. NHGRI is now supporting a number of large studies to identify clinically relevant genomic variants and to study how patients will use genomic information when they receive it from their doctors. Such studies will help illuminate the path forward to implement genomic medicine.

Societal Implications of Genomics

When James Watson, NHGRI's founding director and lead scientist directing NIH's contribution to the HGP, held his first press conference, he announced that he would devote a portion of the research funding to studying the ethical, legal, and social implications (ELSI) of genomic science. He surprised then-NIH Director James Wyngaarden with this announcement, but the proposal stuck, and Congress enshrined it in NHGRI's budget. Since the beginning of the HGP, NHGRI has invested some $300 million to support nearly 500 research projects studying the broader ELSI issues of genomics research. Today, five percent of NHGRI's annual budget is still spent on ELSI research. The debates fostered throughout the ELSI research community helped focus congressional attention on the need to pass the Genetic Information Non-discrimination Act of 2008 (called GINA), as well as to bring attention to a range of important issues such as genomic privacy, access to genomic data, and direct-to-consumer genetic testing.

Dr. Watson also observed that the public needs to understand genomic science, especially as individuals will be increasingly asked to make their own medical decisions based on genomic information. NHGRI has increased its public outreach through National DNA Day activities for students, the 600-plus videos on GenomeTV (our YouTube channel) and other social media, and an exhibition on genomics opening this summer through our collaboration with the Smithsonian's National Museum of Natural History.

I have been involved in genomics research for over 25 years and was a front-line, start-to-finish participant in the HGP. I have now been at NHGRI for 18.5 years, holding various leadership positions for most of that time. And finally, I have been the NHGRI director for 3.5 years. When I look back at what has been done and how far we have come in genomics since the start of the HGP in 1990 and then, in particular, since the end of the HGP a decade ago, I am simply amazed. At the same time, I am profoundly excited about the new frontiers that we are moving into, especially as we explore the applications of genomics to medical care. Day in and day out, I think about where genomics is going and conclude that the field's future is bright.

Posted: October 21, 2013

Last updated: October 21, 2013