NHGRI logo

Gene Sweepstakes (GeneSweep)

The friendly wager about the human gene count during the Human Genome Project
By Aiden Ledbetter and Zachary M. Utz, M.A.

In 2000, NBC News Meet the Press moderator Andrea Mitchell referred to the ongoing Human Genome Project as a “breakthrough compared to the moon landing and the invention of the wheel.” However, at that time — almost three years before the project would be complete — whether researchers could successfully generate the first human genome sequence was a bit of a gamble.

Overview

The Human Genome Project involved hundreds of interdisciplinary scientists across continents working together  for over a decade. Along the way, they experienced countless setbacks, struggles and disappointments — and there was no guarantee that it would all be worth the effort. There were some people in both the scientific community and among the public who believed that there was little value in churning out the seemingly endless string of letters that represents a human genome sequence and that this project would not benefit human health. With a worldwide price tag of roughly $3 billion, the Human Genome Project was a risky bet!

However, that was not the only wager that Human Genome Project scientists would end up making in their quest.

In May 2000, at a bar located within the Cold Spring Harbor Laboratory in New York, some of the project’s scientists started an organized and friendly wager on something seemingly less consequential but scientifically alluring — predicting the total number of genes encoded in the human genome. 

The Gene Sweepstakes — or GeneSweep as it became popularly known — was a three-year-long, sweepstakes-style contest organized by British bioinformatician Ewan Birney, Ph.D., of the European Bioinformatics Institute. Scientists participated in the contest by betting on the total number of protein-coding genes that would be identified in the human genome sequence generated by the Human Genome Project.

By the time the Human Genome Project was completed in April 2003, over 460 people had submitted bets, with some predictions being fairly close to today’s estimate of ~20,000 genes and others predicting the presence of over 200,000 genes! 

Key questions lingered throughout this contest and afterwards, demonstrating that the competition was not as trivial as originally thought. Was this, in fact, real science? How straightforward was it to identify human genes? Why did bets vary so widely? And who won?

The Ensembl Project

Before there was GeneSweep, there was the Ensembl Project, an online database to help researchers explore the locations of genes across the immense human genome sequence. Dr. Birney and his colleagues began developing the Ensembl Project in 1999 in anticipation of the completion of the Human Genome Project. 

 

As genes were identified, the Ensembl Project researchers would document the location of these genes in the human genome sequence. This process was increasingly performed in more automated ways using new software tools, which quickly became much more efficient than manual approaches. The Ensembl Project website thus became a key place to get the most up-to-date information about the collection of identified human genomes. It also became a key place to get updates on the GeneSweep vote counts, as shown below.

 

Screenshot of Ensembl webpage circa May 2000

 

Caption: An Ensembl Project webpage (on ensembl.org) explaining the origins of GeneSweep, along with a table and bar graph summarizing the bets as of May 2000 from NHGRI History of Genomics Archive)

Transcript: I was this sort of young, slightly rule breaking – not rule breaking – but anyway very extrovert kid who had seen some of the possibilities of how we could analyze the human genome. I worked very closely with Michelle Clamp and Tim Hubbard to annotate the human genome. So I kind of wanted to get our name out there, get us out there, Ensembl. I did it almost for PR reasons, as well as for fun, as it were. I definitely got a big kick out of going around that evening, that evening, the very first evening. I had a little plastic beer thing, asked people to put the money, and I would talk. There were many Nobel Laureates who bought a number that evening, that day. So it was a great way to meet a whole bunch of people.

2000 Cold Spring Harbor Laboratory Meeting

Flyer from 2000 Cold Spring Harbor Meeting


Caption: A flier from the 2000 Cold Spring Harbor Meeting on Genome Sequencing and Biology from NHGRI History of Genomics Archive.

 

At the annual Cold Spring Harbor Laboratory Meeting on Genome Sequencing and Biology in May 2000, during which Human Genome Project researchers also announced that they were nearing completion of a “working draft” sequence of the human genome, a French researcher named Hugues Roest-Crollius, Ph.D., gave a talk. He was working for a research center named Genoscope, and his research group had recently sequenced the genome of a small fish called the Tetraodon. A member of the pufferfish family, Tetraodon was studied because it was known to contain a smaller-than-usual genome for a vertebrate species. From the early analyses of the Tetraodon genome sequence, Dr. Roest-Crollius’ group found evidence that the total number of Tetraodon genes might be significantly smaller than what most genome researchers had expected. In a 2024 interview, Dr. Roest-Crollius recounted the events at the 2000 Cold Spring Harbor Laboratory meeting.

Transcript: I can’t remember exactly when he raised this old notebook which was going to be the book in which people were going to record the numbers they were thinking about. I think it was like, I think my talk ended the session so it was either mid-morning or end of the morning and he must have been the one next up after the break or after the lunch break. And So by then I was still coming up with the reality of what just happened which is that we had given, I had given this talk and everybody was a bit shaken and every one was completely like discussing the possibility that the number of genes was going to be so low and if so “what was the alternative, you know, how could we explain?” Just all the discussions you could imagine with people realizing that the number of genes was not going to 80,000 but more like 30,000. And so people disbelieving it and people believing it and people, you know… anyway this was all going on and I think that was the most interesting part of it.

The robust discussion generated by Dr. Roest-Crollius’ 2000 talk indeed gave Dr. Birney an idea: Why not have researchers place bets on their predictions for the number of human genes? 

 

Dr. Birney recounted this moment in a 2017 interview for the National Human Genome Research Institute’s Oral History Collection: “I was this young Brit... precocious, doing this, and then I came round with this [betting] book and persuaded effectively everybody in the meeting to put a dollar and put a number in.” 

 

Shortly thereafter, the 29th Meeting of the National Advisory Council for Human Genome Research added further awareness of GeneSweep. Specifically, the director’s report given by the then NHGRI Director Francis Collins, M.D., Ph.D., the NHGRI director at the time, included a section dedicated to the “Gene Sweepstake,” pointing out that the basis of GeneSweep (i.e., predicting the total number of human genes) was “one of the hotly debated topics” at the meeting. 

 

At that point, the GeneSweep page on the Ensembl website was up and running, with 228 votes already recorded. GeneSweep was off and running!

 

May 2000 NACHGR Agenda and Votes

 

Caption: Table of Contents for the Director’s Report given at the May 2000 meeting of the National Advisory Council for Human Genome Research (left) and the GeneSweep page document located behind associated Tab I. (From NHGRI History of Genomics Archive).

The Rules and “Footnotes”

Every contest needs rules, and GeneSweep was no exception. The rules began as notes that Dr. Birney wrote down in the official GeneSweep betting book, which has since become one of the more well-known historical items from the Human Genome Project. (The book is currently on loan to the LWL – Museum für Naturkunde in Münster, Germany for a temporary exhibition through September 2026). Dr. Birney also posted those same rules on the Ensembl website.

 

Handwritten rule of Gene Sweepstakes (2000-2003)

 

Caption: The rules for GeneSweep, written by Dr. Ewan Birney in the official GeneSweep betting book at Cold Spring Harbor Laboratory (image provided by Dr. Lee Rowen).

 

The rules dictated the amount of money a person could bet, the limit of one bet per year per person, and the details for how scientists could share their estimation methods. Alongside the rules, Dr. Birney added “footnotes” meant to address the nuances of betting on the total number of something that, in many ways, was scientifically unclear — that is, what exactly is a gene?

 

At a basic level, a gene is considered to be a region of the genome that codes for a specific protein or a segment of a protein. The resulting proteins, in turn, are used to build cells and tissues that make a human. However, there are many complex steps to create a protein from a gene, and there are also many parts of the genome that are similar to genes but might not exactly fit the simple definition of a gene. So in the early 2000’s, there was debate amongst GeneSweep participants as to what to actually count as a gene.

 

The footnotes attempted to define at least what GeneSweep would count as a gene, so as to create a level playing field. However, the list of footnotes kept growing.

 

Dr. Birney later stated, “People would ask me some corner case, and so then, for that corner case, I put a little footnote... There was a lot of footnotes.”

 

 

Ensembl footnoote

 

Caption: The GeneSweep footnotes to the official rules, as depicted on the Ensembl Project website in May 2000 (From NHGRI History of Genomics Archive).

Last updated: November 26, 2024