We fight against tuberculosis

16.03.2018 | Detlef Janssen

Even nowadays, tuberculosis (TB) is a widespread disease that, in fact, causes the most infectious disease-related deaths worldwide. It has been estimated that approximately one-third of the world’s population is infected with TB, and it can be found particularly in developing countries. One of the major problems is the development of a multidrug-resistant tuberculosis (MDR-TB), challenging all existing treatment strategies.

In order to tackle this problem, an ambitious publicly funded project, called TB-SeqDisK, has been started. The aim of the project is to develop a diagnostics process chain to detect Mycobacterium tuberculosis, the bacteria that cause TB, and to determine resistances of M. tuberculosis against antibiotics. 

The first step of diagnosis is a fast and reliable detection of TB, which is of utmost importance in order to prevent transmission and infection of other people. The second step is to determine the existence of potential resistances against antibiotics since successful curative treatment depends on the administration of effective antibiotics. The GATC Biotech Research and Development team is involved in the development of a next generation sequencing-based approach to detect antibiotic resistance in M. tuberculosis. 

At the end of January 2018 the first workshop of the TB-SeqDisK research consortium was held at the GATC headquarter in Constance, Germany. During this workshop, timetables, project specifications, and initial results were discussed and set. Stay tuned for more information and updates on the project!

GATC Biotech is looking for partners with whom innovative and future-oriented publicly funded projects can be realized. For cooperation and further information, please contact us at:



World Health Organization (2017) Global tuberculosis report 2017. Licence: CC BY-NCSA 3.0 IGO.

The biology of Christmas

14.12.2017 | Detlef Janssen

Christmas is a long anticipated and very special time of the year that is connected to temporal changes in human emotions and behaviour. The biochemical and molecular biological mechanisms that underlie these changes are, however, not well-established yet. Nevertheless, they seem to be linked to the inner clock, also called biological rhythm, which a central regulatory mechanism in humans, and eukaryotes in general, that prepares the organism to reoccurring events (1). It has been hypothesised that a circannual rhythm, i.e. a rhythm of annual periodicity, evolved in humans for the reoccurring event of Christmas. It is hormonally regulated and associated to the activity in different brain areas, such as the parietal lobules. These brain areas have also been hypothesised to be the residency of the “Christmas spirit” (2,3).

It is speculated that the Christmas circannual rhythm is a four-phased mechanism of (a) organisation, (b) hormone release and positive feedback, (c) behaviour climax, and (d) subsequently a negative feedback. During the organisation phase, a cluster of neurons in the hypothalamus, labelled “Christmas vivifying centre”, is active. This cluster of neurons seems to be dormant for approximately 11 months of the year and becomes active around December. The activity of the “Christmas vivifying centre” is promoted by strong environmental cues, termed “zeitgeber”. These are visual cues such as candle lights and Christmas trees, auditory cues such as Christmas carols and the sound of bells, as well as olfactory cues such as mulled claret, gingerbread, and cinnamon. The accumulating perception of these “zeitgeber” leads to the occurrence of a very particular feeling of pleasantness and cosiness that occasionally is called “Christmas feeling”. This is accompanied by increased inclination for interpersonal bonding, and increased tendency to participate in the acquisition of goods, especially of the woollen kind. Preliminary hands-on investigations suggest that the “Christmas feeling” is seasonally restricted. The smell of cinnamon for instance is linked to increased pleasantness when perceived during the Christmas season (4).

Molecular biological investigations found that the Christmas “zeitgeber” trigger hormonal responses, specifically the synthesis of Christmas-releasing factor (ChRF) (2). ChRF activates different hypothetical target genes, such as SANTA (SEASONAL ACTIVATED NIKOLAUS TRANSCRITPION FACTOR A) and ELF (ENHANCED LEBKUCHEN FEEDING) (5), as well as the release of other hormones such as oxytocin. It has been suggested that hormonal and gene activation responses escalate by positive feedback mechanisms, and climax at Christmas Eve or Christmas Day. This is represented externally by the exchange of Christmas gifts (2). 
Due to the massive systemic disruption of homeostasis, the effects of negative feedback mechanisms that were superimposed thus far surface more and more by this point, resulting in various adverse effects during the negative feedback phase after Christmas. These are mainly, but not exclusively, remorse of over-indulging in Christmas food, discontent with the received Christmas presents, and decreased levels of the social bonding hormone oxytocin. This often result in signing of a gym contract, also linked to the New Year’s resolution vicious cycle, return of inappropriate and unwanted gifts, and increased divorce rates in January. The negative feedback phase, however, rapidly ceases, potentially due to the activation of “New Year’s Eve anticipation” mechanisms.  

These are only hypotheses though, and causation has not been shown! Merry Christmas!


1. Harmer, L., Panda, S, Kay, S.A. (2001) Molecular Bases of Circadian Rhythms. Annu. Rev. Cell Dev. Biol. 17:215-53.

2. Ludwig M. (2011). Christmas: an event driven by our hormones? J. Neuroendocrinol. 23(12):1191-3.

3. Hougaard, A., Lindberg, U., Arngrim, N., Larsson, H.B., Olesen, J., Amin, F.M., Ashina, M., Haddock, B.T. (2015) Evidence of a Christmas spirit network in the brain: functional MRI study. BMJ. 351:h6266.

4. Seo, H.-S., Buschhüter, D., Hummel, T. (2009) Odor attributes change in relation to the time of the year. Cinnamon odor is more familiar and pleasant during Christmas season than summertime. Appetite. 53(2)2: 222-5.

5. Ebertz, A., Pantke, C., Janssen, D. (2017) Seasonal mediated differential expression of Christmas associated genes. Unpublished data.  

Music from within the nucleus

05.12.2017 | Detlef Janssen

What would it sound like if DNA code could be expressed as music? This question has been asked by Kenshi Hayashi and Nobuo Munakata, two biochemists at the National Cancer Research Institute, Tokyo, in 1984. They initially proposed the translation of DNA sequences into music in order to reduce distress of handling extensive data sets when computer-assisted analyses tool did not exist yet. Solmization syllables were assigned to the four nucleotides to create the first DNA-based music (1). Later, this concept was further developed, and according to various nucleotide and amino acid sequences, Munakata produced songs that could be described as electronic music with a unique touch.  

Susumu Ohno, who was an evolutionary biologist at Beckman Research Institute of the City of Hope, Duarte, proposed in the 80s that the principle of recurrence is the basis for coding sequences of the DNA as well as music. According to evolutionary theory, at first primordial nucleotide sequences existed and genomes are repetitions of genes whose coding sequences consist of truncated and base-substituted variances of these primordial oligomers. Since songs are also based on recurrences of musical sequences, Ohno concluded that DNA sequences could be translated into music. He assigned two consecutive positions in the octave scale to each nucleotide based on their molecular weight; adenine and guanine occupy the lower end, while cytosine and thymine are on the upper end. It was a big surprise when Ohno reversed this process and translated the Chopin Nocturne Op. 55, No. 1 into an open reading frame of 160 codons as it very closely resembled the last exon of the large subunit of the RNA polymerase II in mice (2,3,4). 

David Deamer, a biochemist at the University of California, Santa Cruz, developed a method that utilises the light absorption of the four nucleotide bases measured by spectrophotometer to translate DNA into music. The absorption spectra of infrared light of different wavelengths by the nucleotides is translated into a frequency range that is audible to humans. The data is then converted into musical tones with a synthesiser (5). On this basis, Deamer and the composer Susan Alexjander released the album ‘Sequencia’ with appealing songs that could be described as a mixture of Western and Indian classical music.  

Several scientists and artists have since contributed to the field of DNA translation to music, such as Stuart Mitchell, founder of Your DNA Song Ltd (, who offers his clients to compose individual music based on their own DNA sequencing data, as provided by 23 and Me for instance, in genres like Classic, Jazz, as well as Rock or Techno.  

In the near future, large DNA data sets of the microbiome obtained by INVIEW MICROBIOME PROFILING 2.0 for instance could possibly be translated into music and add an artistic component to research. It certainly would be interesting to compare for example the music based on the gut microbiome of a healthy individual to the one of an individual suffering from inflammatory bowel disease, where abnormal gut microbiota compositions were found (6).


1. Hayashi, K., and Munakata, N. (1984) Basically musical. Nature 310(5973):96.

2. Ohno, S., and Ohno, M. (1986) The all pervasive principle of repetitious recurrence governs not only coding sequence construction but also human endeavor in musical composition. Immunogenetics 24(2):71-8.

3. Ohno, S. (1988) On periodicities governing the construction of genes and proteins. Anim Genet. 19(4):305-16.

4. Ohno, S. (1987) Repetition as the Essence of Life on this Earth: Music and Genes. In Modern Trends in Human Leukemia VII. Haematology and Blood Transfusion / Hämatologie und Bluttransfusion (ed. by Neth, R., Gallo, R.C., Greaves, M.F., Kabisch, H.), pp 511-9. Springer, Berlin, Heidelberg

5. Alexjander, S., and Deamer, D. (1999) The infrared frequencies of DNA bases: science and art. IEEE Eng Med Biol Mag. 18(2):74-9.

6. Carding, S., Verbeke, K., Vipond, D.T., Corfe, B.M., Owen, L.J. (2015) Dysbiosis of the gut microbiota in disease. Microbial Ecology in Health and Disease, 26, 10.3402/mehd.v26.26191.

DNA helix

The use of whole exome sequencing (WES) for diagnostics purposes has increasingly gained credibility over the past years.  Applying next-generation sequencing (NGS) to locate genetic variants in exons has proven to be particularly useful for improving our understanding about common disorders, cancers and monogenic or Mendelian diseases. 

Whole exome sequencing has been especially beneficial for discovery of genetic causes underlying Mendelian phenotypes. There is an estimated number of 6,000 to 8,000 Mendelian diseases, which include cystic fibrosis, sickle cell anemia and Huntington’s disease. These disorders  are caused by mutations in a single gene and although considered rare individually, the diseases collectively occur at a rate of 40 to 82 per 1,000 live births. Approximately 7.9 million children are born annually with a birth defect due to a genetic or partially genetic cause. Most Mendelian disorders tend to run in families, although a substantial number are caused by de novo events.

As of early 2015, 2,937 genes responsible for 4,163 Mendelian phenotypes have been discovered through methods like Sanger sequencing. Although applications of whole exome sequencing (WES) in clinical diagnostics are still relatively new, the technique has been used to identify already more than 150 genes in Mendelian disorders and this number is steadily growing. Use of clinical WES is increasingly supported by large cohort studies. For example, recent large scale clinical WES studies have shown that exome sequencing can lead to a successful molecular diagnosis in up to 25% of patient cases.

The usefulness of performing exome sequencing is evident in several areas of clinical care. Identification of the causative variant underlying a particular monogenic disorder allows for gene-specific prognostication based on cases reported in literature. WES can help facilitate genetic counseling by enabling more accurate estimates of recurrence risk in the family. The genetic information is valuable for guiding subsequent pregnancies or identifying other at-risk family members.  Lastly targeted molecular therapy may be available for the specific genetic mutation, which can help to improve patient outcomes. 

For clinical researchers looking to try out the WES approach in their own studies, GATC Biotech offers a complete service package with INVIEW EXOME. The product offers highly efficient exome capture using proprietary protocols based on the leading Agilent SureSelect enrichment technology. The service offers high fidelity NGS on the Illumina platform and professional BioIT analysis, including optional use of QIAGEN’s Ingenuity Variant Analysis tool.


Chong JX et al. (2015). The Genetic Basis of Mendelian Phenotypes: Discoveries, Challenges, and Opportunities. American Journal of Human Genetics 97(2):199-215.

Jamuar SS and Tan EC. (2015). Clinical application of next-generation sequencing for Mendelian diseases. Human Genomics 9(1):10.

Shen T et al (2015). The long tail and rare disease research: the impact of next-generation sequencing for rare Mendelian disorders. Genetic Research 97:e15.

Smedley D and Robinson PR (2015). Phenotype-driven strategies for exome prioritization of human Mendelian disease genes. Genome Medicine 7(1):81.

Rabbani B et al. (2014). The promise of whole-exome sequencing in medical genetics. Journal of Human Genetics 59:5-15.

Still struggling with genome assemblies?

06.07.2017 | Detlef Janssen

Circos plot: Number of contigs based on short-read sequencing and long-read sequencing
Circos plots of the assembled genomes [5]. Alignments of
the HGAP-assembly of the INVIEW DE NOVO GENOME 2.0 data (left)
and Velvet assembly (right) of the short read Illumina data against
E.coli DH1 reference are shown. Regions of homology are highlighted
by colored ribbons.

Sequencing and assembly of whole genomes, even of small genomes, has been cost- and labor extensive for a long time. The advent of next-generation sequencing eased and simplified this process dramatically.

But a single contig in genome assembly is nearly impossible to reach with sequencing based on short reads.  The ultra-long reads gained through PacBio’s SMRT sequencing technology and optimised bioinformatics analysis enable high quality genome assemblies. Additionally, plasmid present in your DNA preparation will be assembled as well as often carrying valuable information. 

Read the full story of how to proceed “towards a "single contig" genome assembly with INVIEW DE NOVO GENOME 2.0

Infographic: Evolution of Sanger sequencing - 1970 in comparison to 2017

Five years ago, Cologne welcomed a distinguished biotech company to the city. GATC Biotech had just centralised its Sanger sequencing laboratories from Constance, London and Düsseldorf to Cologne. The Cologne lab was equipped with technology spread on over 470 m2 of space and was staffed with 20 experienced employees eager to get busy with processing Sanger sequencing samples.

The location of the laboratory, in close proximity to the Cologne airport, made overnight Sanger sequencing an accepted and expected standard for GATC customers all over Europe. The company established a precedent in speedy sequencing and high quality data and is proud to deliver the fastest results to this day.

In the last five years, the Cologne laboratory has nearly doubled in size. Currently 27 employees work round the clock to deliver quick reliable data to GATC’s Sanger sequencing customers. To honour the lab’s birthday we took a brief walk down GATC’s Sanger sequencing memory lane with our Director of Sales & Customer Care, Mr. Jochen Schäfer.

1. What machines were used for Sanger sequencing in the 1990s?

In the early 1990s, we used our self-developed machines called “Direct Blotting System GATC 1500”. In 1995 the first 48-hours sequencing service was established with ABI 373, the so-called “Plate Sequencer”. In 2000, we changed over to Capillary Sequencing with ABI 3700 and introduced the world’s first 24 hour sequencing service. Since 2005 we have been using ABI 3730 XL and since 2006 we have been offering overnight sequencing with our NightXpress service. 

2. How did the data look like back then?

In the 1990s the read size was only 450 bases, in the following years, the read length grew to 650 bases and now we have up to 1100 bases.

3. How long did it take to process and deliver the data?

At the beginning, it took 48 hours to sequence the samples, in 2004 it took approximately 24 hours and since 2007 we offer the “NightXpress” service. Customers drop off their samples in the evening and get their results in the morning between 8 a.m. and 11 a.m.

4. How was the data delivered?

At the beginning data was delivered with floppy discs, in the late 1990s per e-mail and currently via the Internet. Customers receive their results directly in their online myGATC accounts in their Watchboxes. 

5. What is the GATC “Watchbox” and how did this name come to be?

In general a “watch box” is a small container where you can store your watches or in our case, your sequencing data. With such a box you can keep track of the time or in our case you can keep detailed tracking of all your sequencing samples. At GATC, this analogy was accepted around the time of the introduction of the first LIMS-system in 1999. Customers can track all their samples with a Watchbox, which makes it an essential part of GATC´s processes.

6. How much did sequencing cost?

In 1990 the selling price was 20 to 25 DM (Deutsche Mark) for one Sanger read. By 2004, the price had altered to 12 to 15 EUR and with the ABI 3730 we reduced the selling price to under 10 EUR. Today a customer can purchase a bar code for a Sanger read starting a couple of Euros depending on the sequencing service.

7. How has GATC Biotech innovated the Sanger sequencing field?

GATC Biotech holds a lot of industry firsts. We had the first non-radioactive sequencer, the first 24-hour sequencing service, the first online ordering system for the life sciences and the first overnight sequencing service with results the next morning after sample receipt. (See our latest video about Logistics@GATC: Böxle on the road)

In addition, we were one of the first to introduce a barcoding system, where each sample is identified by a unique barcode. This made possible the full automatisation of the Sanger sequencing workflow, facilitated the ordering process and enabled easy online sample tracking for our customers. 

8. How do you see the Sanger sequencing market changing?

It is getting faster and faster, because the orders are increasing and the applications are getting more diverse. We celebrated the first millionth Watchbox in February 2015, the second in December 2016 and the third we expect in 12 months.

9. What products does GATC Biotech offer for Sanger sequencing today?

We offer a variety of Sanger sequencing products. Our LIGHTRUN service is our simplest and most convenient sequencing service for both tubes and plates. The service offers quick, reliable sequencing of DNA samples pre-mixed with primer.

Our SUPREMERUN service is ideal for more challenging templates. This product is also available for single samples or high-throughput Sanger sequencing. The DNA and primer are provided separately. Here, the customer can choose freely from our expansive list of universal primers. 

Cell culture contaminated with Mycoplasma
Image by courtesy of Multiplexion

Mycoplasma detection is an essential task for any cell culture laboratory. Nearly all prestigious scientific journals require evidence of absence of mycoplasma contamination before publication of data from immortalised cell lines. A 2013 survey in Australia and New Zealand found that about 75% of participating researchers perform mycoplasma testing. Interestingly, when testing was performed, 18-20% of scientists detected contamination in at least one sample (Shannon et al. 2016). 

The survey also found that 32% of researchers perform mycoplasma detection in their own laboratories, whereas others chose to use in-house services or external providers for mycoplasma testing (Shannon et al. 2016).

Here are three tips for how to make mycoplasma testing more efficient and less tedious:

1. Find a trustworthy test – Decide for a test that is most convenient and dependable for you. Many researchers opt for the PCR method, as it is the quickest option and it picks up on the majority of mycoplasma species. The sensitivity of the technique is increased when using qPCR and standardised protocols, such as the ones suggested by the “World Health Organization International Standard to Harmonize Assays for Detection of Mycoplasma DNA” (Nübling et al., 2015).

2. Establish a strict mycoplasma testing program and follow it regularly – Importantly, test all actively growing cell lines at defined time intervals. Typically, scientists test monthly to quarterly depending on the volume of cell culture work and on individual risk assessment. If you receive new, non-tested cell lines, quarantine them until they test negative for mycoplasma contamination. When possible maintain cell cultures for two to three months only, then discard and replace with fresh vials from the same pre-tested working stocks. 

3. Consider outsourcing – One way to make the task of mycoplasma testing more pleasant is to have someone else do it for you. Consider outsourcing to a service provider to save resources and material costs in your own laboratory and to gain access to trained specialists with mycoplasma detection expertise. Outsourcing allows you to save people-power and invest scientists’ time into more productive experimental work. There is also no need to train new staff into the methods of mycoplasma testing. Moreover, you will gain the advantage of objectivity, as an independent party is more likely to judge the testing results impartially. Consider the many advantages services like MYCOPLASMACHECK can offer, including quality certified testing with proper controls and with no risk of cross-contamination, as well as results delivered in reports ready for journal submission.    


Corral-Vazquez C. 2017. Cell lines authentication and mycoplasma detection as minimun quality control of cell lines in biobanking. Cell and Tissue Banking 18(2):pp.271-280.

Shannon M. et al (2016) Is cell culture a risky business? Risk analysis based on scientist survey data. Int J Cancer 138(3):664–670.

Nübling CM et al. (2015). World Health Organization International Standard to Harmonize Assays for Detection of Mycoplasma DNA. Applied and Environmental Microbiology, 81(17):5694-702.

Davis L. (2015). The risky business of cell culture. Retrieved from (June 2, 2017): 

Infographic: Liquid biopsy market overview

As a non-invasive test for cancer research and diagnostics, liquid biopsy has already gained lots of traction. The market is hot and business analysts are anticipating more future growth as liquid biopsy is increasingly adopted by healthcare providers. The clinical acceptance of liquid biopsies will likely be boosted by several advantages the tests have over traditional tissue biopsies. Some of these benefits include lower total test cost, quicker turnaround times, ability to capture tumour heterogeneity, ability to monitor recurrence and the minimally invasive nature of liquid biopsies.

Currently, three major biomarkers are explored by liquid biopsies. A report by Research and Markets identifies over 50 liquid biopsy tests that are presently offered on the market. Of these, 50% of the tests are based on detection of cancer biomarkers in circulating tumour DNA (ctDNA). Roughly 37% of the tests are based on characterisation of circulating tumour cells (CTCs) and the remaining 13% draw conclusions from exosome analysis. 

A Kalorama information report shows that the most common genes currently analysed in cell-free DNA (cfDNA) include BRAF, EGFR, ESR1, KRAS, MET, PIK3CA, TP53, KIT and PDGFRA. The report points to a variety of clinical uses of liquid biopsies in oncology including early detection, identification of mutations for targeted therapy, patient stratification, companion diagnostics, tracking of minimal residual disease, characterisation of molecular heterogeneity, monitoring of tumour dynamics and metastases, cancer prognosis and others. 

Financially, liquid biopsies are meant to be a lucrative investment. Research and Markets predicts that the global liquid biopsy market will reach nearly $4.5 billion by 2020. The cancer application segment is expected to make up $2.5 billion of the market. Research and Markets predicts that four major cancer types, prostate cancer, breast cancer, colorectal cancer and lung cancer, will be the main market drivers by 2030, accounting for over 70% of the total liquid biopsy market.

Convinced of the potential of liquid biopsy to transform patient care, GATC Biotech has established a unique service line for non-invasive analysis of cfDNA. GATCLIQUID offers three services for accurate tumour mutation profiling from blood. GATCLIQUID ONCOEXOME is a unique service for whole exome sequencing of cfDNA that provides an unbiased overview of all mutations in protein coding regions. GATCLIQUID ONCOPANEL ALL-IN-ONE is a next-generation sequencing based cancer panel that offers targeted screening of key cancer drivers. GATCLIQUID ONCOTARGET enables ultra-sensitive monitoring of the most important tumour mutation in a given case. Together the services serve aim to improve cancer research and diagnostics today and in the years to come.


Kalorma Information. (2017). Cell-free DNA (cfDNA): Market Size and Share Analysis (Report No. KLI15188961).

Research and Markets. (2016). Liquid Biopsy Resarch Tools, Services and Diagnostics: Global Markets (Report No: 3632954). 

Research and Markets. (2015). Non-Invasive Cancer Diagnostics Market, 2015-2030 (Report No. 3454294 ). 

Example of an “index hopping” measurement using a lane with 6 ChIP libraries
Example of an “index hopping” measurement using a lane with 8 RNA libraries:
Plots of analysis of “index hopping” events
Library prepration for HiSeq 4000 at GATC Biotech


A recent publication of Sinha et al  from April 2017 stimulated a lively discussion about a phenomenon referred to as “index hopping”, “index swapping” or “barcode mis-assignment”. It occurs when multiplexed samples are being sequenced on Illumina´s HiSeq 3000/4000/X Ten systems using Exclusion-Amplification (ExAmp) chemistry. They observed that “up to 5-10% of sequencing reads are incorrectly assigned from a given sample to other samples in a multiplexed pool”. Illumina reacted with a white paper describing the impact and best practices for minimising barcode mis-assignment and reported “index hopping” rates of below 2% on patterned flow cells. “Index hopping” rates were dependent on the library preparation method showing highest rates for PCR-free libraries and libraries contaminated through free adapters and primers. While the underlying mechanism remains elusive, the overall consensus from Illumina´s white paper, as well as bloggers from Enseqlopedia and UC Davis Genome Centre, is that clean sequencing libraries are essential for sequencing on the HiSeq 3000/4000/X Ten. Moreover, they declared that “for the majority of applications ’index hopping’ between clean libraries will be minimal and will have minimal or no impact on the data analysis”.


Since we run a large number of HiSeq 4000 projects, we took the matter very seriously and started digging into our data to assess the level of “index hopping” at GATC Biotech. From two recent HiSeq 4000 sequencing runs, 5 lanes were selected with 5 to 9 libraries per lane comprising different library types: strand-specific RNA libraries from different organisms (2 lanes), all exome-enriched DNA libraries (WES) (1 lane), and ChIP libraries (2 lanes).

The number of reads with unexpected dual index combinations not matching the combinations of the loaded libraries were retrieved from the file ‘DemuxSummaryF1L[1-8].txt’, which is generated automatically for each lane during demultiplexing. For each possible dual index combination (including the ones present in the pool and all combinations not present in the pool), the number of reads was divided by the total number of reads of the lane to get percent index representation values. The results of one lane with 6 ChIP libraries with unique i7 and i5 indices are shown in figure 1 and another lane with 8 RNA libraries in figure 2. The observed levels of “index hopping” were substantially lower than the ones reported in the Illumina white paper. The background read distribution seems to be random as every possible combination of indices was detected. We could not observe a significant correlation between the library type and the level of “index hopping”. The three other analysed lanes containing RNA, ChIP and WES libraries showed similar levels of “mis-assignments” (data not shown). Analysing all “index hopping” events across 5 lanes, a median value of 0.008% was determined (Figure 3). 

By summing up all “index hopping” events per lane, the cumulative median frequency of “index hopping” per lane was 0.27%. In contrast to our findings, applying this measure to the example data presented in Illumina’s white paper (Figure 3 of Illumina’s white paper) the cumulative rate of “index hopping” was 1.59%, which is nearly six times higher than what we observed at GATC Biotech. 


The data presented was derived from currently ongoing customer projects and was randomly selected. We assume that the library preparation has the highest influence on levels of “index hopping” events. As our library preparation process results in clean high-quality libraries (i.e. no detectable primer and adapter dimers), we consequently have extremely low rates of “index hopping”. At GATC Biotech most steps of the library preparation are automated using liquid handlers and very strict purification steps are performed, which seem to mitigate this effect to nearly negligible amounts (Figure 4).

With our workflow, a non-uniquely dual indexed library may contain on average 0.008% of the reads coming from a library sharing one of the index sequences. This equals to 1 mis-assigned read per 1,250 correctly assigned reads. For example, if a non-uniquely dual indexed library was loaded with approximately 10% of total reads (e.g. 30 million read pairs) per lane and this library was affected by “index hopping” as another library present on the lane shared one of the indices, then 0.08% of reads (e.g. 24,000 read pairs) of the affected library would originate from the contaminating library.

Does this level of mis-assigned reads influence data interpretation?
For many study types such as whole genome sequencing, whole exome sequencing and bisulphite sequencing no influence is expected. 

This includes re-sequencing projects aiming at detecting minor allele frequencies down to 1%, where usually a sequencing depth of 300x average coverage is recommended. This means that at least 3 unique reads with a specific mutation are needed in order to call a mutation. At 300x average coverage and an “index hopping” rate of 0.08%, there is <30% chance that a single mis-assigned read with the mutation may be detected, which is well below the threshold of 3 mutated reads. Moreover, this will only be the case if the mutation was present  at 100% in the “contaminating library”. If the mutation frequency is lower, the likelihood for carry-over is even further reduced. Therefore, rare mutation detection studies are very unlikely to be affected at GATC Biotech. If the “contaminating library” belongs to a different organism most of the “index hopping” reads will not map, leaving the experimental data unaffected. 

Another concern is RNA-Seq on HiSeq4000, where gene expression levels can vary substantially between sample types or treatments. The impact on an experiment, however, is in most cases very low. For example, if a cell line would upregulate a certain transcript upon treatment with a compound by the factor of 100, e.g. from 10 FPKM to 1,000 FPKM, the “index hopping” could increase the FPKM of the untreated control from 10 to 11 FPKM (~0.1% of 1000 FPKM). In conclusion, the fold change will not be substantially different.

Nevertheless, for single cell RNA-Seq where commonly up to 384 libraries are pooled on a single lane, it is recommended to use uniquely indexed libraries if very different cell types are analysed. 

Overall, similar to the reports from Sinha et al and Illumina, we observed “index hopping” on HiSeq 4000 but at significantly lower levels. GATC’s proprietary library preparation protocols and high degree of automation show that this effect can be reduced by preparation of high quality libraries and rigid purification / size selection steps. 

In any case, we will continue to monitor “index hopping” on a regular basis to ensure only the highest quality standards are achieved for our customers. 


1. Sinha R et al. (2017). Index Switching Causes “Spreading-Of-Signal” Among Multiplexed Samples In Illumina HiSeq 4000 DNA Sequencing. BioRxiv: doi:

2. Illumina (2017). Effects of Index Misassignment on Multiplexing and Downstream Analysis [white paper].

3. (2017). Update on @illmina index-swapping [Blog post].

4. Froenicke L. (2017). Update on Barcode Mis-Assignment Issue [Blog post]. 

Happy DNA Day!

24.04.2017 | Detlef Janssen

Infographic: DNA fun facts

Today is none other than DNA Day! The special day is celebrated every year on April 25 to commemorate the first publication of DNA structure in 1953, as well as the completion of the Human Genome Project in 2003.

DNA Day was first marked on April 25, 2003 in the United States. Annual DNA Day celebrations have since been organised by the National Human Genome Research Institute. The purpose of the event is to offer students, teachers and the public an opportunity to learn about the latest advances in genomic research.

GATC Biotech is proud to offer expertise in the DNA sequencing field, ranging from Sanger sequencing to whole genome sequencing to targeted sequencing. But besides technical knowledge, we’ve also found out a thing or two that can get anyone excited about DNA. Read some DNA fun facts below:

1. Half-man, half-microorganism
Not quite, but humans harbour as many as 145 genes that have jumped from bacteria, viruses or other single-celled organisms through the process of horizontal gene transfer. Most of these genes play established roles in metabolism, immune responses and other biochemical processes.

2. No T-Rex resurrection

Scientists believe that DNA has a half-life of 521 years. This means that at a temperature of -5°C, every bond would be destroyed after a maximum of 6.8 million years. DNA would cease to be readable much earlier, roughly at 1.5 million years. Bad luck for T-rex, as dinosaurs are believed to have lived 65 million years ago.

3. Are you a pumpkin head?
Humans and pumpkins share about 75% common DNA. About 98% of our genetic make-up is identical to chimpanzees and human-to human genetic variation is only 0.5% to 1%.

4. Get out of jail free card
DNA-based evidence has exonerated more than 300 wrongly convicted prisoners in the U.S. since 1989. Twenty of these prisoners have been on death row.

5. DNA goes sugar-free
Xeno nucleic acid (XNA) is a synthetic alternative to DNA. XNA is created by exchanging DNA’s sugar group for any number of artificially produced molecules. Six of these XNAs already exist, such as glycol nucleic acid (GNA), threose nucleic acid (TNA) and peptide nucleic acid (TNA)

6. To Pluto and back? You’ve got it in you! 
If the DNA in all cells of the human body was uncoiled, it would stretch 16 billion kilometers. Depending on the location in their orbits, the distance from Earth to Pluto varies between 4 and 7.5 billion kilometres.

7. DNA in the cell’s power generator
Human mitochondrial DNA (mtDNA) encodes for only 37 genes. Of these genes, 13 code for proteins of the electron transport chain and the rest code for transfer RNAs (tRNAs) and ribosomal RNAs (rRNAs). In mammalians, mtDNA is usually inherited from the mother, as mitochondria in mammalian sperm are usually lost or destroyed in the process of fertilization.

8. DNA and CSI
In forensics, DNA profiling is based on polymerase chain reaction (PCR) and uses short tandem repeats (STR) that are highly variable. DNA analysts in North America look at 13 specific DNA loci, whereas those based in the UK have a 17 loci system. The odds that two individuals have the same thirteen-loci DNA profile is about one in a billion.

9. An octoploid coffee bean
Humans are diploid organisms with two pairs of 23 chromosomes or 46 in total. Some C. arabica coffee species are octoploids with eight sets of 11 chromosomes or 88 in total.

10. All in a day’s work
It takes about 8 hours for a mammalian cell to completely copy its DNA. Human DNA replicates at a rate of 50 nucleotides per second at 20 to 80 origins of replication. In contrast, E. coli DNA replicates at a rate of 1,000 nucleotides per second at one single origin of replication. The process takes about 40 minutes. 

11. Birds of a feather flock together
A controversial 2014 study of 2,000 Americans found that people tend to befriend those with similar DNA to their own. The authors analysed 500,000 markers from across the genome to conclude that friends share about 0.1% more DNA than strangers. This level of similarity is expected for fourth cousins.

12. Should Anne of Green Gables join X-Men?
Red hair, freckles and blue eyes are genetically considered mutations. Red hair appears in people with a recessive allele on chromosome 16 which produces an altered version of the MC1R protein. The MC1R gene is also often implicated in the presence of freckles. A specific mutation in the HERC2 gene, which affects the function of OCA2 is strongly linked to the appearance of blue eyes.