Last updated: May 30, 2017
Sequencing and Re-Sequencing the Biome!
Workshop Summary
Sequencing and Re-Sequencing the Biome!
National Human Genome Research Institute
Eric Lander and Bob Austin, Chairs
Jane Peterson and Jeff Schloss, Organizers
July 23, 2002
This workshop examined the scientific need for additional genomic sequence data, the current cost drivers for high throughput sequencing, potential new technologies that could lead to a revolutionary breakthrough in sequencing technology in the next five to ten years and what the National Human Genome Research Institute (NHGRI) should do to ensure that the field of sequencing technology and high throughput sequencing moves forward. Starting in 1990 at about $10 per bp, the reduction in sequencing costs has followed a straight line on a log curve and in 2002 are about $0.09 per finished bp.
With this success in mind, the workshop attendees discussed the following:
- Can we keep the reduction of sequencing costs on this curve or improve it?
- What will be the scientific value of each additional genome that is sequenced?
- What is the best route to collecting additional sequence data most rapidly and cost-effectively?
- Would medical care improve if we could fully sequence each person's genome if we could do it for $1,000 per genome? If so, how can we quickly get to that point?
The need for more sequence data: DNA sequence is now a reagent that facilitates computational predictions that drive selective experimentation. We need to sequence: (1) novel genomes (non-mammalian genomes for studying proteins and mammalian genomes for studying regulatory elements); (2) cDNAs (to obtain definitive protein sequences); and (3) pools enriched for regulatory and other functional sequences. Comparative genomics will give us many insights into the non-coding regions of the organism. We also need to re-sequence certain genomes to obtain information about genetic variation.
Major current cost factors in sequencing: Large-scale sequencing as currently practiced in the sequencing centers is a very complex process, requiring substantial infrastructure. The fixed cost of the current process is, very roughly, $1 per read for fixed costs and $1 per read for variable costs. Several elements can be identified in the sequencing enterprise that could be improved to reduce the costs: increasing the market size to spread R&D costs, maximizing yield per run (number of reads, read length, quality), system integration, process improvements and supply chain management. The introduction of radical new technologies would change all of these parameters.
There is a driver for the sequencing market for 20 to 30 years, comparable to the transition in the computer industry from the mainframe computer to the PC. It is envisioned that in 20 to 30 years every lab will be doing high throughput sequencing and it will be widely used in healthcare. This will allow development costs to be spread over a larger market.
What we can learn from past experience: The change in technology from slab gels to capillary systems removed one of the largest cost factors: lane tracking. Issues currently being worked on include enhancement of data quality, increase in production capacity per unit of investment and minimization of reagent consumption. Continuing improvements of this type can be expected over the next two to three cycles of improvements for the next 4 to 5 years. Beyond improvements in the current process, technology drivers include removal of Sanger sequencing as the key technology, achievement of true multiplexing of samples, single molecule fragmentation, label-less detection of single molecules and detection of molecules in mixtures.
To drive technology improvement, NHGRI needs more aggressive cost reduction goals than the ones articulated in the HGP's last five-year plan. Support for technology development has been reasonable, but advanced technology development has been very disappointing as judged by the absence of fundamentally new technologies on the market today. Better mechanisms are needed to transfer technologies from academia to industry and back to the sequencing laboratory. The venture capitalist is not interested in developing technologies for a $50M market, which only exceptional DNA technology companies can expect to exceed. Private financing for de novo DNA sequencing technology is very limited at present and is looking for a return on investment in three years.
Prospects for technologies for resequencing: Four challenges need to be met for resequencing diploid systems by using hybridization array: (1) more probes per array to allow examination of large portions of genome in the fewest number of experiments; (2) detection of alleles present in populations at low frequencies, which will require screening large numbers of individuals, sensitive detection in pools, and preparing haploid samples to allow haplotype determination; (3) sample preparation requiring region-specific amplification to reduce complexity in a way that doesn't require sequence specific reagents for each region to be amplified; and (4) detection of heterozygotes in complex samples, which requires the use of reduced target complexity and high data redundancy. A general challenge of array-based methods is that with arrays one only sees changes from the expected.
New technologies that might revolutionize DNA sequencing in the next 5 to 10 years: Achieving truly inexpensive and distributed sequencing capability would completely change the nature of biomedical research. The issues are speed, integration and accuracy, each of which has hidden costs. To address the large market that is possible, the sequencing instrument must cost $1,000 and work at the femtoliter scale. Several revolutionary sequencing methods are currently in commercial R&D. These include several that read single (or few) molecules with the possibility of high levels of multiplexing. Most of these offer potential to yield large numbers of base pairs very rapidly from very small volume reactions, in highly integrated systems.
NHGRI's role in promoting the development of sequencing technologies and in facilitating the hand-off of technologies from the academic to the private sector:
Hurdles: Academic laboratories will not be able to optimize system performance to bring them to the point of production utility; this requires industry involvement. But industry operates on a short time horizon, so one role of government can be to reduce risk. But the amount of money needed to achieve this is greater than the government is usually willing to invest in individual projects. Achieving large, complex goals therefore requires unconventional mechanisms.
The ATP model suggests a program of academics, private companies and NGO's to promote technology development. Another approach is the one the Department of Defense takes, in which if one wants/needs a system, one pays for the entire process from development to deployment. It was also suggested that NHGRI should pilot some sequencing projects by contracts with commercial entities similar to what was done with the earlier pilot sequencing program in the academic sector. Subsequently the successful companies could be scaled up to provide the sequencing capacity needed. There was considerable discussion as to whether or not this model would provide the incentives needed to get companies to invest the substantial sums needed to revolutionize the technology.
Summary:
More sequence is needed:
- To develop the ability to find the functional elements and evolutionary innovations in genomes. A central component of reading meaning in genomes is to be able to read at the level of cDNAs. This part of the sequencing program must be integrated with research in computational analysis and development of large-scale functional assays to attach meaning to the sequence data.
- To actually find functional elements in 'key' organisms and identify prominent evolutionary innovations by surveying enough organismal diversity. Deciding how much to do depends on cost and on how much new data is obtained with each additional organism sequenced (i.e., When have we reached the point of diminishing returns?).
- To re-sequence in humans to find the inherited basis of disease and in other organisms to understand the basis for phenotypic variation. This will require creating large databases across populations, as well as technologies for resequencing all of the variation in specific individuals, for their healthcare.
Stimulating the development of sequencing and re-sequencing technologies.
Continued funding is needed to develop new ideas and seed projects. These are high risk and moderate cost. Multiple creative ideas need to be supported for long enough to determine if they can work. Even this early stage of research could benefit from stimulating 'invention factories' at the interface between universities and industry, and that gather teams with the relevant expertise and capabilities. Traditional NIH grant review mechanisms may not be adequate to serve the needs of such a focused program.
Much of the discussion revolved around the question of whether there would be sizeable market for extremely cheap sequencing so that a human genome could be sequenced for $1,000. There was a good deal of excitement and agreement that the health care market would be interested in correlating genotype to phenotype and would be large enough to attract the investment needed. It was thought that a two-log decrease in cost in the next five years was possible and would generate a market for diagnostics and therapeutics. Already established principles of nanofabrication and integration should lead to this initial cost decrease. The key technologies at least for these decreases in cost do not have to be invented, just integrated. There was agreement that there is no physical reason that we can't achieve a $1,000 genome (which requires an additional 2 to 3 logs of improvement), but it will take many new ideas paired with powerful engineering to achieve this goal. It was proposed that engineering/biology teams need to be developed to work on this and NHGRI would ask for grants with an engineering component and a technology path to the $1,000 genome. The groups would have to meet (and be reviewed for) rigorous milestones as the technology develops.