It would be really unfortunate to exist for nearly 4 billion years without anybody noticing your presence. But that’s exactly what happened to one lonely molecule called deoxyribonucleic acid (DNA). The molecule duplicated, mutated and evolved without anyone giving it any thought. Even complex multicellular Homo sapiens carried on for thousands of years completely ignorant of their DNA, although each of their billions of cells carries two meters of the genetic material.
DNA had to wait until 1869 for its first physical encounter with a human. The lucky man was Friedrich Mietscher, an Austrian physician who successfully isolated DNA in the form of chromatin from pus-soaked hospital bandages.
Scientists at that time were not fully convinced DNA was worth getting excited about. Most still believed proteins were the carriers of genetic information. The false notion began to change in the 1940s and in 1952 the matter was finally laid to rest with an elegantly simple experiment from Alfred Hershey and Martha Chase, which demonstrated once and for all that DNA is genetic material.
Just one year later, Francis Crick and James Watson with a crystallography hint from Rosalind Franklin, introduced the world to the double helical structure of DNA in 1953. With the structure now in the bag, scientists began their search for DNA function. The answer came from Marshal Nirenberg, who in 1961, showed that different combinations of DNA bases code for specific amino acids, the building blocks of proteins.
With the realisation that DNA was the blueprint for life, came the curiosity to “read” the plan which DNA held within. A great molecule to start sequencing with was RNA, as these types of nucleic acids are single-stranded and often considerably shorter than DNA. Indeed, in 1965, Robert Holley and his co-workers became the first people to read the bases of a nucleic acid when they sequenced yeast transfer RNA (tRNA) using RNAses with base specificity. In 1970, Ray Wu was first to decipher a short sequence of DNA by using a technique called primer extension. Two years later, Walter Fiers read the first ever DNA sequence of a whole gene, the one encoding a MS2 virus coating protein. One year later, Walter Gilbert and Allan Maxam developed a way of sequencing DNA which used chemicals to cut DNA at certain bases. In 1975, Frederick Sanger introduced his first alternative method to DNA sequencing, called the “plus and minus” technique. The approach used polyacrylamide gels to separate products of primed synthesis in order of increasing chain length. In 1977, Sanger modified Ray Wu’s primer extension technique to develop the chain-terminator method or dideoxy sequencing or simply Sanger sequencing as we know it. The technique went on to dominate the sequencing world for the next 30 years.
Sanger used his newly developed method to sequence the first ever genome in 1977. The genome of bacteriophage virus øx174 became the most popular DNA positive control in labs around the world. A few years later, in 1982, researchers discovered DNA mutations. The first documented case was of a single DNA base change in the HRAS gene that could affect the onset of bladder cancer by altering the structure of its protein product.
Meanwhile, improvements to the Sanger sequencing method were constantly made. In 1984, Fritz Pohl developed the first non-radioactive sequencing technology platform, GATC1500. In 1986, Leroy Hood in collaboration with Applied Biosystems developed the first semi-automated DNA sequencing machine where sequencing data could be directly collected by a computer. The following year, Applied Biosystems launched the first automated DNA sequencing machine, selling at $300,000 apiece. Nearly 10 years later, ABI would become the first commercial provider to use capillary electrophoresis rather than a slab gel, establishing truly automated DNA sequencing.
Meanwhile, in 1990, the ambitious Human Genome Project began with the astronomical costs of $75 per DNA base. That same year, Haemophilus influenza became the first bacterium to have its genome sequenced using the “shotgun” sequencing technique. The slightly longer and more complex yeast genome of the Saccharomyces cerevisiae species followed in 1996.
1996 was not just the year of the yeast, it was also the year where next-generation sequencing (NGS) first came to be. It was during this year that Mostafa Ronaghi introduced a new DNA reading technique called pyrosequencing, based on a sequencing-by-synthesis method. Two years later, Shankar Balasubramanian and David Klenerman founded Solexa, the precursor to Illumina. The two men combined efforts to develop a new sequencing-by-synthesis technique based on fluorescent dyes. 1998 was also the year that first animal genome was successfully sequenced, that of the microscopic worm, Caenorhabditis elegans. One year later, an international collaboration managed to publish the first human chromosome sequence, introducing the scientific community to chromosome 22.
The beginning of the 21st century was certainly an exciting time for DNA. Genomics success stories were pouring in from every corner of the world. In 2000, Arabidposis thaliana became the first plant and Drosophila melanogaster the first insect to have their respective genomes sequenced. The first year of the new millennium also saw the much awaited first draft version of the human genome sequence, a combined effort attributed to project leaders Francis Collins from the U.S. National Institute of Health and Craig Venter, founder of Celera. In 2001, the draft human genome sequence, based on samples from 12 anonymous volunteers, was officially published. In 2002, the complete genome sequence of the mouse followed, showing 90% identities to that of humans. In 2003, the human genome sequence of around 3 billion pairs was finalised, although a few gaps still exist to this day.
The next-generation sequencers were not sitting idly during the human genome sequencing craze. In 2005, Jonathan Rothberg and colleagues used pyrosequencing to develop the 454 system, the first next-generation sequencing platform to come on the market. Meanwhile, Solexa researchers used their own sequencing-by-synthesis technique to read the whole genome of a virus called phiX-174. In 2007, Illumina took over Solexa in a $600 million buy-out, going on to provide the most widely used next-generation sequencing technology in the world. In 2007, a new competitor to 454 and Illumina was released in the form of the SOLiD system, which was based on sequencing by ligation. A few years later, in 2011, Life Technologies released another competing sequencer, the Ion Torrent, which used a form of sequencing-by-synthesis based on detection of hydrogen ions whenever new DNA is made.
Next-generation sequencing was becoming more and more accepted in the scientific community. In 2008, the International Cancer Genome Consortium was launched with the goal of using NGS to analyse thousands of tumour samples and profile cancer-related mutations. This was a tremendous year for cancer research, as scientists also managed to decode the whole DNA sequence of a cancer for the first time. To achieve this, they used NGS to read the genetic code from leukaemia cells isolated from a 50 year old patient. Also in 2008, James Watson became the first person to have his whole genome read using NGS.
The year of 2009 was the first time third-generation sequencing technology came into the spotlight with the release of the Helicos sequencer. This technique made use of single molecule fluorescent sequencing to read DNA sequences, but the technology quickly fell out of favour due to high error rates. The technology was more successful in the hands of Pacific Biosciences, who launched their first single molecule real time sequencing platform in 2011.
The latest sequencing technology to hit the mainstream was nanopore sequencing, where DNA is passed through a tiny nanopore in a membrane. The order of bases is then determined based on changes in the electrical current across the pore. Oxford Nanopore Technologies became the first company to commercialise this new form of sequencing in 2012.
DNA sequencing is now as popular as ever with scientists reading its composition and using the information for a countless number of applications. Now that good quality sequencing data is becoming cheaper and easier to generate, it is highly probable that the need for bioinformatics analysis will grow. A truly multidisciplinary approach will be needed in order to interpret and make use of the vast amount of generated data. Nowhere is this truer than in the field of personalised medicine. For newly emerging applications like liquid biopsies for non-invasive cancer detection, DNA analysis holds the great promise of personalised, effective and painless care.
Odds are it won’t take another four billion years to get there.