Last updated: May 16, 2010
Pilot Study Explores Feasibility of Sequencing Human DNA
Pilot Study Explores Feasibility of Sequencing Human DNA
April 1996
The National Human Genome Research Institute (NHGRI), a key player in the international Human Genome Project (HGP) and part of the National Institutes of Health (NIH), announced today the launch of an unprecedented pilot study to explore the feasibility of large-scale sequencing of human DNA. This initiative, which is budgeted at over $18 million in the first year and $60 million over three years, marks the transition to the third and most technologically challenging phase of the HGP - determining the sequence of the 3 billion subunits, or bases, of human DNA. This initiative will involve six U.S. research centers and is projected to produce the sequence of about 3 percent of human DNA in the first two years.
According to Dr. Francis Collins, NHGRI director, these studies will help to streamline and cut the cost of DNA sequencing. If the pilot study shows such large-scale sequencing can be done rapidly, accurately and cost-effectively, the HGP will be poised to scale up its efforts to sequence the entire human genome on time, by the year 2005, which is the project's dominant goal. "I'm extremely optimistic that in three years we'll be in a position to go after the complete human sequence in earnest," said Dr. Collins.
The six groups participating in the pilot projects will strive for an accuracy level approaching one error per 10,000 bases, or 99.99 percent accuracy, in all regions of the genome. But, according to NHGRI's assistant director for program coordination, Dr. Mark Guyer, "The pilot project will help us determine whether it will be necessary to strive for 99.99 percent accuracy when we scale up, and whether it can be done cost effectively with the sequencing strategies emerging during the next three years."
Given the value of this information in a host of research settings, especially for finding genes linked to disease, sequence data from the studies will be uploaded rapidly into the public computer databases GenBank and Genome Sequence DataBase, and to World Wide Web pages at the project sites.
NHGRI is encouraging the grantees to release preliminary DNA sequence information to the research community within a few days or weeks of its discovery. This is much faster than research information is usually released, but the tremendous value of this data for disease research justifies this aggressive policy. The "finished" sequence, with all data double- and triple-checked, is to be placed in public databases soon after that.
NHGRI is discouraging pilot project scientists from seeking patents on the raw genomic sequence. The scientists are free to apply for patents if they have done additional biological experiments that reveal convincing evidence for utility of the sequence - a standard criterion for patenting. Patent protection encourages companies to invest the large sums of money needed to develop diagnostic and therapeutic products. However, according to NHGRI, patent applications on large blocks of primary human genomic DNA sequence could have a chilling effect on the development of future inventions of useful products. This policy responds to recommendations made by several advisory groups that helped formulate the structure and goals of the HGP, and those made at a recent meeting of scientists planning for large-scale human DNA sequencing.
The immediate challenge for the pilot study is to refine strategies needed to determine the order of the 3 billion bases in the genome, analyze that information, and present it to the rest of the biomedical research community. The amount of information involved is huge by conventional standards: if you were to print out all the information in the human genome, letter by letter, it would fill a stack of books as tall as a 12-story building.
When the HGP was launched in 1990, experts set out a series of goals that were to be met. The first, construction of a human genetic linkage map, was completed in 1994, a year ahead of schedule. The genetic map is used to study how diseases are inherited in families, and the goal was completed when the map contained enough landmarks to be equal to one marker every mile along a road leading from New York to San Francisco.
The physical map will probably be complete next year, also a year ahead of schedule. The physical map is used for locating the genes that are involved in disease and normal human development. When it is done, the spacing of markers will be equivalent to having a milepost every tenth of a mile along the road from New York to San Francisco.
To prepare for DNA sequencing, NHGRI has been supporting research on developing instruments, computer programs and molecular biology methods. In particular, just within the past year, NHGRI has committed approximately $25 million in two special initiatives to develop specific technologies that will decrease the cost and increase the throughput of large-scale DNA sequencing.
Projects to sequence the DNA of a bacterium, yeast, a worm and a fly have also been supported. These projects are an opportunity for scientists to "practice" sequencing on genomes that are much smaller than that of the human, but bigger than anything that had been sequenced before. They also help scientists understand human DNA, because the "model organism" DNA contains information very similar to part of the information in the human genome.
The pilot projects begin with lessons learned in model organism and smaller scale, human DNA sequencing, but also emphasize the continued need for innovation and labor-saving strategies. Indeed, the pilot projects will test a number of approaches that could have a profound impact on large-scale sequencing not only for the HGP, but for the rest of science and industry.
These innovations include full automation using robotic arms and advanced biochemistry to purify and make ready tens of thousands of DNA samples per day; computer systems to track hundreds of thousands of samples through the laboratory and collect and analyze millions of "letters" of DNA sequence; software to automate decision-making processes; and strategies for converting available DNA maps, which are of high quality but only moderate detail, into a form that will feed identified DNA fragments into the sequencing process.
The principal investigators and information on individual projects, including the dollar amounts for the first year of the grants, is provided:
Mark D. Adams, Ph.D., The Institute for Genomic Research, Rockville, MD ($3.2 million):
The TIGR group has experience in managing the sequencing of large numbers of small human DNA fragments and large blocks of microbial genomic DNA. It will now apply this experience to sequencing human DNA on the short arm of chromosome 16. Technology development will focus on efficient creation and use of DNA "libraries" to make the transition from physical maps to DNA sequence. In addition, software will be developed for sample tracking and data management, and to automate the assembly of sequence data.
Richard A. Gibbs, Ph.D., Baylor College of Medicine, Houston, TX ($1.3 million):
Human chromosome X contains some regions with a high density of genes and other regions with much lower gene density. This group will scale up from its current investigational level of sequencing on this chromosome, to learn more about the structure of these different regions. Dr. Gibbs's group is fine-tuning DNA purification and reaction chemistry methods, and developing a series of new fluorescence energy-transfer dyes with the potential to improve the accuracy of automated reading of DNA bases. They will test a novel strategy for reducing the number of sequencing reactions needed to complete a region of DNA to high accuracy.
Eric S. Lander, Ph.D., Whitehead Institute for Biomedical Research, Cambridge, MA ($4.1 million):
In the process of playing a major role in assembling the physical map of the human genome, this group built an extremely efficient, high-throughput automated mapping machine and software for automated data analysis. They also set a very high standard for rapid release of genome information to the research community. Lessons learned in this process form the foundation of their new effort to develop an exportable robotic system that will be operated by a relatively modest team of 25 people, and have the capacity to sequence human DNA rapidly, accurately and cost-effectively. The entire sequencing process, including sample preparation, sample tracking and computer analysis of the data, will be embodied in the automated system. This group, focusing at first on chromosomes 9 and 17, will also develop automation to convert the current physical map to the map required for sequencing.
Richard M. Myers, Ph.D., Stanford University, Stanford, CA ($2.5 million):
This group has extensive experience in large-scale physical map construction. They will test a "directed" strategy that requires more up-front mapping but less-complex subsequent computation to sequence regions of chromosomes 4 and 21. In collaboration with industrial partners, enzymes will be developed to improve the up-front mapping steps, and "DNA chips" will be tested for verifying the DNA sequence.
Maynard V. Olson, Ph.D., University of Washington, Seattle, WA ($1 million):
This group has identified critical technologies for incorporation into the DNA sequencing production line. They will reduce to practice a high-resolution mapping method in the context of sequencing, in order to decrease the number of sequencing reactions needed to assemble highly accurate sequence, and to provide a skeleton of information against which to check their results. These methods will be applied to sequencing in regions of chromosome 7. This activity will occur in parallel with a complementary project to develop additional technology and sequence regions of other chromosomes, supported by the Department of Energy, Office of Health and Environmental Research.
Robert H. Waterston, M.D., Ph.D., Washington University, St. Louis, MO. ($6.7 million):
The world's largest contribution to sequencing of a single genome has been made by this group, in collaboration with John Sulston's laboratory at the Sanger Centre in Cambridge, Great Britain. Together they have completed over 37 million bases of the nematode worm, C. elegans, genome. Rapid public release of data is their hallmark. Leveraging this experience in genomic DNA sequencing, Waterston's group will test management structures for large-scale genomic sequencing, focusing initially on regions of human chromosomes 22 and X. They will continue to collaborate with the Sanger Centre to identify bottlenecks in the sequencing process, and will use a combination of automated devices, improved methods and computer software to increase data production and decrease cost while maintaining high accuracy.