Last updated: March 07, 2012
Workshop on the Future of the Large-Scale Sequencing Program
Workshop on the Future of the Large-Scale Sequencing Program
June 13, 2005
Executive Summary
The National Human Genome Research Institute convened a workshop to obtain opinions from the scientific community on the current status and potential future directions of the NHGRI large-scale sequencing program. Participants were asked to consider the scientific, technological, and strategic opportunities in evaluating NHGRI's future investment in sequencing, and to specifically address several general questions and challenges:
- Given what has already been accomplished - very high quality assembled genome sequences of the human and major model organisms, draft sequence assemblies of genomes representing many of the nodes of the metazoan lineage, concerted application of comparative sequencing to annotate mammalian genomes - what are the best future opportunities for large-scale sequencing? What is the proper balance of these types of projects going forward? Should other kinds of large-scale sequencing projects be considered? What is the continuing priority of large-scale sequencing as a source of genomic data compared with other types of genomic data?
- Disruptive technologies appear to be promising enough that a significant reduction in the cost of DNA sequencing could occur within the next three years. What are the realistic prospects for the introduction of such a disruptive technology? How should it be anticipated and encouraged? How would it affect sequencing costs and capacity? How would it affect the types of scientific questions that can be addressed? How should the possibility of future significant cost reductions affect the decisions about the types of sequencing projects that should be initiated in the next two to three years?
- How should NHGRI evaluate the ongoing value of its investment in a large-scale sequencing program? How should it assess the contribution that continued sequencing will make to scientific research overall and genomic research in particular? How should it ensure that the genomic sequencing program will continue to yield the greatest return for biomedical research?
Participants included members of user communities, sequencing center personnel, sequencing advisors, members of the various working groups that select new sequencing targets for the NHGRI program, developers of new sequencing technologies, scientific advisors to the large-scale sequencing program, and members of the National Advisory Council on Human Genome Research.
There was a strong consensus that the genomic sequence information that has already been obtained is extraordinarily valuable, and indeed that the larger scientific community had just begun to make use of it. Participants had a sense that, with new technologies on the horizon making sequencing perhaps 10-fold less expensive, more ambitious scientific challenges could be addressed, and that the sequence information would continue to transform the way that biomedical science is done. Indeed, participants thought that sequence information was still inappropriately undervalued by much of the community, even by some of those that depend on it. There was a broad consensus that the program should continue at a level of investment not very different from what it is today. However, there was also broad consensus that the next solicitation for sequencing proposals should be significantly modified from the previous one for the program, that the target selection process would have to be revised to be driven more by important scientific problems, and that NHGRI needed to seek ways to ensure that the broader community can better use the genomic information and indeed become true stakeholders, rather than recipients of the data.
Most of the discussion and recommendations fell into a number of broad and closely inter-related themes, within which were some specific recommendations.
- Disseminate knowledge about how to use sequence and develop true stakeholders. NHGRI and other agencies that fund sequence production have generally relied on simply releasing the data into the public domain. As a consequence of the current programmatic structure, in which all the effort has been centralized, much of the expertise about how best to use sequence information actually resides in the sequencing centers. This has been a reasonable beginning strategy, but will not serve science as well in the future. Instead, NHGRI should look for ways to actively increase the constituency for genome sequence to develop a community of true stakeholders that will be able to find increased use for genomic sequence. In the new solicitation, there should be an incentive for centers to disseminate knowledge through collaborations, education, and other means. Past performance in this regard could be a review criterion.
- NHGRI should take advantage of the knowledge about how to use sequence information that resides in the centers. This emphasizes that the centers themselves often have some of the most compelling ideas for the use of sequencing capacity. Thus, they should be allowed a more active role in selecting projects, especially in collaborations on specific problems. In the next solicitation, NHGRI should consider allowing center-initiated projects in the mix of inputs to the target selection process.
- Technology will change rapidly, but unpredictably. New technologies seem poised to rapidly decrease the cost of sequencing, perhaps by as much as 10-fold. In all likelihood, new read types will first be used as an adjunct to more traditional reads in an assembly or for resequencing. But as read lengths improve and paired-end sequencing becomes possible with them, the new technologies will replace Sanger sequencing over time for whole genome shotgun assemblies. Sequencing centers will continue to have to spend considerable effort adapting even commercially available instruments into their production environments. The timing of all of this is uncertain, but it is beginning now and could play a significant role in the next three years. NHGRI should consider with care how much it should decrease its investment in sequencing-if cuts to the program are too large or made too quickly, it will stifle the transition to new technology.
- Target selection should be based increasingly on big, compelling scientific questions and less on review of individual organism targets. As capacity increases, the current target selection process will not scale. In general, NHGRI must continue to find the most compelling problems to address with its large-scale sequencing capacity. The current program seeks proposals for sequencing targets and evaluates them only one, or a few, at a time. Instead, NHGRI should seek to address the most compelling questions. Several examples were mentioned: the Human Cancer Genome Project; a similar effort aimed at another major system or disease; identifying the basis for 50-100 Mendelian disorders; identifying all the differences within the hominid lineage between apes and humans; and several others. NHGRI only needs to have a few of these at a time; these would not preclude doing other, still important but less ambitious projects at the same time. It was suggested that NHGRI organize a scientific publication that would invite several prominent scientists to imagine how they would use sequence information if it became much more easily available.
- Matching the throughput of the sequencing pipelines to their inputs and outputs will require resources. As capacity increases, NHGRI will need to pay attention to the community's ability to use the data. In addition to the points mentioned in the other numbered themes, this will require new computational infrastructure. With more capacity, obtaining samples for new sequencing projects will also require significant resources and coordination.
- The program must remain flexible in several respects. Production of whole genome shotgun sequence will still be a useful and important part of the program. But increasingly, other sequencing products will be more relevant for solving certain critical biomedical problems. Therefore, the next solicitation should seek centers with a range of capabilities, including whole genome shotgun production, production of directed sequencing reads, ESTs and cDNAs, and other products. This will enable the program to respond to the most important challenges as they are identified over the next several years. The program must also maintain flexibility to adapt new technologies. One of the strengths of the program is that it is composed of a portfolio of centers with different abilities.
- Sequencing is not merely a commodity (even if Q20 base pairs may be). Production of reads is an important core function of sequencing centers, but is not sufficient for a number of reasons. For example, genomes are not all identical-some are more challenging than others. Methods continue to evolve. Centers must be able to adopt new technologies rapidly. Most significantly, centers are repositories of knowledge about how to use sequence information, and thus are intellectual resources for the scientific community. The centers are evolving towards being true genome centers. There was some discussion that the centers should be allowed to venture beyond sequencing-related activities. One suggestion was to organize the centers as a component of a program along the lines of the MIT Media Center, in which one would construct a model for a fully integrated set of projects to address a major problem-for example, how rational health care could be delivered. In any event, the performance of the future centers should be measured by a wider set of parameters than Q20 production.