Basic biology

Base pair (bp)

    • Two complementary nucleotides that are paired in double-stranded DNA. Adenosine (A) pairs with thymine (T), and guanine (G) pairs with cytosine (C). A bp is also used as a physical distance of length of a sequence of nucleotides, eg, 20 bp is a chain of DNA composed of 20 nucleotides.

    • Kilobase (kb): One thousand base pairs.

    • Megabase (Mb): One million base pairs.

    • Gigabase (Gb): One billion base pairs.


  • A molecule composed of a double helix carrying genetic instructions for the development, functioning, growth and reproduction of all known organisms and many viruses.


  • The sum total of the genetic material of a cell or an organism.


  • An interdisciplinary field of biology focusing on the structure, function, evolution, mapping, and editing of genomes.

Molecular biology

DNA extraction

  • A procedure by which DNA can be isolated from cells so that it can then be used in molecular biological experiments or forensic research.

Gel electrophoresis

  • A method for separation and analysis of macromolecules (DNA, RNA and proteins) and their fragments, based on their size and charge.

Polymerase chain reaction (PCR)

  • A procedure in which segments of DNA (including DNA copies of RNA) can be amplified using flanking oligonucleotides called primers and repeated cycles of replication by DNA polymerase.

Quantitative PCR (qPCR)

  • A PCR-based laboratory technique that allows the accurate measurement of the amount of specific nucleic acids (usually RNA) in a sample.



  • An interdisciplinary field that develops methods and software tools for understanding biological data, in particular when the data sets are large and complex. It includes specific analysis "pipelines" that are repeatedly used, particularly in the field of genomics.

DNA barcoding

  • Amplification and sequencing of short DNA fragments (also called ‘amplicons’) that contain diagnostic sequences to distinguish taxa. Species identifications are then carried out by comparisons to reference databases.

DNA sequencing

  • The process of determining the nucleotide order of a given DNA fragment.

Environmental DNA (eDNA)

  • Environmental DNA traces left by organisms in their environment, e.g., plant pollen in soil or fish scales in water. eDNA can be enriched and sequenced by metabarcoding from e.g., water samples, allowing characterization of whole communities via passive sampling.

Genome skimming

  • Metagenomic approach in which only multi-copy loci (usually chloroplast or mitochondrial genomes) are retained. Contiguous sequences of these regions are then assembled and used for phylogenetics and community analysis.

Library (preparation)

  • A 'library' is the final DNA product that is prepared for sequencing. Library preparation generally entails: (i) fragmentation, (ii) end-repair, (iii) phosphorylation of the 5′ prime ends, (iv) A-tailing of the 3′ ends to facilitate ligation to sequencing adapters, (v) ligation of adapters, and (vi) some number of PCR cycles to enrich for product that has adapters ligated to both ends.

Next generation/high-throughput sequencing

  • DNA sequencing technology that permits rapid sequencing of large portions of the genome; so called because it vastly increases the throughput over classic Sanger sequencing.


  • A discrete segment of sequence information generated by a sequencing instrument; read length refers to the number of nucleotides in the segment.


  • Sequencing of mixed DNA barcode amplicons from bulk community or pooled samples. The qualitative taxon composition of the bulk sample can then be characterized based on the recovered barcode reads by comparisons against a reference database. PCR amplification bias can, however, skew quantitative inferences of taxon abundances in the community.


  • Shotgun sequencing of genomic DNA of a target taxon or bulk community. Longer contiguous sequences (particularly for multi-copy loci) can be generated by assembling the resulting reads, for example for whole mitochondrial genomes. By omitting PCR amplification, less biased quantitative assessments of communities are possible. The resulting long contigs also provide better phylogenetic resolution than short DNA barcodes alone.