NHGRI logo

Gene Sweepstakes (GeneSweep)

The friendly wager about the human gene count during the Human Genome Project
By Aiden Ledbetter and Zachary M. Utz, M.A.

In 2000, NBC News Meet the Press moderator Andrea Mitchell referred to the ongoing Human Genome Project as a “breakthrough compared to the moon landing and the invention of the wheel.” However, at that time — almost three years before the project would be complete — whether researchers could successfully generate the first human genome sequence was a bit of a gamble.

Overview

The Human Genome Project involved hundreds of interdisciplinary scientists across continents working together  for over a decade. Along the way, they experienced countless setbacks, struggles and disappointments — and there was no guarantee that it would all be worth the effort. There were some people in both the scientific community and among the public who believed that there was little value in churning out the seemingly endless string of letters that represents a human genome sequence and that this project would not benefit human health. With a worldwide price tag of roughly $3 billion, the Human Genome Project was a risky bet!

However, that was not the only wager that Human Genome Project scientists would end up making in their quest.

In May 2000, at a bar located within the Cold Spring Harbor Laboratory in New York, some of the project’s scientists started an organized and friendly wager on something seemingly less consequential but scientifically alluring — predicting the total number of genes encoded in the human genome. 

The Gene Sweepstakes — or GeneSweep as it became popularly known — was a three-year-long, sweepstakes-style contest organized by British bioinformatician Ewan Birney, Ph.D., of the European Bioinformatics Institute. Scientists participated in the contest by betting on the total number of protein-coding genes that would be identified in the human genome sequence generated by the Human Genome Project.

By the time the Human Genome Project was completed in April 2003, over 460 people had submitted bets, with some predictions being fairly close to today’s estimate of ~20,000 genes and others predicting the presence of over 200,000 genes! 

Key questions lingered throughout this contest and afterwards, demonstrating that the competition was not as trivial as originally thought. Was this, in fact, real science? How straightforward was it to identify human genes? Why did bets vary so widely? And who won?

The Ensembl Project

Before there was GeneSweep, there was the Ensembl Project, an online database to help researchers explore the locations of genes across the immense human genome sequence. Dr. Birney and his colleagues began developing the Ensembl Project in 1999 in anticipation of the completion of the Human Genome Project. 

 

As genes were identified, the Ensembl Project researchers would document the location of these genes in the human genome sequence. This process was increasingly performed in more automated ways using new software tools, which quickly became much more efficient than manual approaches. The Ensembl Project website thus became a key place to get the most up-to-date information about the collection of identified human genomes. It also became a key place to get updates on the GeneSweep vote counts, as shown below.

 

Screenshot of Ensembl webpage circa May 2000

 

Caption: An Ensembl Project webpage (on ensembl.org) explaining the origins of GeneSweep, along with a table and bar graph summarizing the bets as of May 2000 from NHGRI History of Genomics Archive)

Transcript: I was this sort of young, slightly rule breaking – not rule breaking – but anyway very extrovert kid who had seen some of the possibilities of how we could analyze the human genome. I worked very closely with Michelle Clamp and Tim Hubbard to annotate the human genome. So I kind of wanted to get our name out there, get us out there, Ensembl. I did it almost for PR reasons, as well as for fun, as it were. I definitely got a big kick out of going around that evening, that evening, the very first evening. I had a little plastic beer thing, asked people to put the money, and I would talk. There were many Nobel Laureates who bought a number that evening, that day. So it was a great way to meet a whole bunch of people.

2000 Cold Spring Harbor Laboratory Meeting

Flyer from 2000 Cold Spring Harbor Meeting


Caption: A flier from the 2000 Cold Spring Harbor Meeting on Genome Sequencing and Biology from NHGRI History of Genomics Archive.

 

At the annual Cold Spring Harbor Laboratory Meeting on Genome Sequencing and Biology in May 2000, during which Human Genome Project researchers also announced that they were nearing completion of a “working draft” sequence of the human genome, a French researcher named Hugues Roest-Crollius, Ph.D., gave a talk. He was working for a research center named Genoscope, and his research group had recently sequenced the genome of a small fish called the Tetraodon. A member of the pufferfish family, Tetraodon was studied because it was known to contain a smaller-than-usual genome for a vertebrate species. From the early analyses of the Tetraodon genome sequence, Dr. Roest-Crollius’ group found evidence that the total number of Tetraodon genes might be significantly smaller than what most genome researchers had expected. In a 2024 interview, Dr. Roest-Crollius recounted the events at the 2000 Cold Spring Harbor Laboratory meeting.

Transcript: I can’t remember exactly when he raised this old notebook which was going to be the book in which people were going to record the numbers they were thinking about. I think it was like, I think my talk ended the session so it was either mid-morning or end of the morning and he must have been the one next up after the break or after the lunch break. And So by then I was still coming up with the reality of what just happened which is that we had given, I had given this talk and everybody was a bit shaken and every one was completely like discussing the possibility that the number of genes was going to be so low and if so “what was the alternative, you know, how could we explain?” Just all the discussions you could imagine with people realizing that the number of genes was not going to 80,000 but more like 30,000. And so people disbelieving it and people believing it and people, you know… anyway this was all going on and I think that was the most interesting part of it.

Last updated: November 26, 2024