Overview – What is de novo genome sequencing?

De novo genome sequencing is used to resolve the primary genetic sequence of a specific prokaryotic or eukaryotic organism. The next-generation sequencing (NGS) is performed for uncharacterised genomes for which no prior knowledge of the nucleotide sequence exists and for which no reference sequence is available.  

Applications – What are the advantages of de novo genome sequencing?

De novo genome sequencing is ideal for:

  • Sequencing of uncharacterised prokaryotic genomes, like bacterial genomes
  • Sequencing of unknown eukaryotic genomes, including plant and human genomes
  • Sequencing of known genomes with significant variation
  • Gap closure and finishing of complex genomes with relatively high amounts of similar or repetitive regions
  • Analysis of structural variants and complex rearrangements, including copy number variations, inversions and translocations
  • Ability to acquire epigenetic information and sequencing data simultaneously

Workflow – de novo genome sequencing methods & technologies

The process of de novo genome sequencing involves the sequencing small DNA fragments, assembling the reads into longer sequences (contigs) and finally ordering the contigs to obtain the entire genome sequence.

Different de novo genome assembly methods are available. Often, a hybrid approach is used where short reads sequenced at higher depths are used to error-correct longer reads from a second library. This de novo genome sequencing strategy requires two libraries, two runs and two data sets. 

GATC Biotech analysis of PacBio reads with a non-hybrid assembly algorithm can generate the longest contigs with a minimum number of misassemblies. A hierarchical genome assembly process (HGAP) (Fig. 1) takes advantage of multiple alignments of all reads to obtain an accurate de novo genome sequence, where even extended repetitive regions are successfully resolved. The PacBio RS II platform also provides the exclusive opportunity to gain additional epigenetic information simultaneously within one sequencing run.

Overview de novo genome sequencing
Figure 1. Overview of the hierarchical genome assembly process (HGAP). Many shorter, quality- filtered reads, are used to error-correct the longest reads of insert (ROI). These ROIs are then often assembled together often into a single contig that can span the length of the genome. (Clark et al.)

Scientific expertise: de novo genome sequencing

Historically, GATC Biotech has been involved in several key de novo genome sequencing projects. In 1993, GATC Biotech participated in the sequencing of the first yeast chromosome. In 2006, GATC Biotech sequenced for the Potato Genome Sequencing Consortium (PGSC). 

GATC Biotech has now sequenced and assembled hundreds of genomes, perfecting protocols for prokaryotic or eukaryotic genomes and improving workflows for finishing genomes of more complex organisms. Our use of cutting-edge single-molecule real-time (SMRT) technology and proprietary genome assembly algorithms provide an accurate method for de novo genome analysis. GATC Biotech has sequenced thousands of genomes of bacteria, fungi, algae and other higher eukaryotes. Please contact us to see how you can benefit from our capabilities.


INVIEW DE NOVO GENOME 2.0 was applied for the KLEBSICURE Consortium, a cooperation project with GATC Biotech AG, the Max Planck Institute for Infectious Biology (Berlin, Germany) and the Ludwik Hirszfeld Institute of Immunology and Experimental Therapy (Wroclaw, Poland) as consortium members and Arsanis Biosciences GmbH (Vienna, Austria) as consortium leader. The main objective of the project is the identification and characterisation of the pathogen Klebsiella pneumoniae, which causes severe infections. The unique combination of de novo sequencing and detection of base modifications was used to study the virulence of the pathogen to guide the generation of monoclonal antibody therapeutics  and to establish a test method for clinical diagnostics.


Find here, a list of selected research articles supported by GATC Biotech ’s sequencing products, including articles on de novo genome sequencing.

Related products to de novo genome sequencing

Did you know that de novo genome sequencing can be accomplished not only quickly, but also cost-efficiently? Simply take advantage of our complete service package including expert library preparation, sequencing on the leading PacBio platform, professional BioIT analysis and a final comprehensive GATC Data Analysis report.

See how our INVIEW DE NOVO GENOME product can help advance your pioneering project or contact us for more information about de novo projects tailored to the size of your organism.

For cases where sequencing data for a specific organism already exist and direct comparison to a reference genome is possible, targeted sequencing or whole-genome resequencing might be the right service for you. Take advantage of our products to help you discover and validate single-base mutations, insertions, deletions and structural variants.

Further reading on de novo genome sequencing

Rhoads, A., Au, K.F. PacBio Sequencing and Its Applications. Genomics Proteomics Bioinformatics. doi: 10.1016/j.gpb.2015.08.002 (2015).

Baker, M. De novo genome assembly: what every biologist should know. Nature Methods  9, 333 – 337 (2012).