discomark: nuclear marker discovery from orthologous sequences using draft genome data

Sereina Rutschmann, Harald Detering, Sabrina Simon, Jakob Fredslund, Michael T. Monaghan

The International Journal of Health Planning and Management

Published online on August 23, 2016

Abstract

High‐throughput sequencing has laid the foundation for fast and cost‐effective development of phylogenetic markers. Here we present the program discomark, which streamlines the development of nuclear DNA (nDNA) markers from whole‐genome (or whole‐transcriptome) sequencing data, combining local alignment, alignment trimming, reference mapping and primer design based on multiple sequence alignments to design primer pairs from input orthologous sequences. To demonstrate the suitability of discomark, we designed markers for two groups of species, one consisting of closely related species and one group of distantly related species. For the closely related members of the species complex of Cloeon dipterum s.l. (Insecta, Ephemeroptera), the program discovered a total of 78 markers. Among these, we selected eight markers for amplification and Sanger sequencing. The exon sequence alignments (2526 base pairs) were used to reconstruct a well‐supported phylogeny and to infer clearly structured haplotype networks. For the distantly related species, we designed primers for the insect order Ephemeroptera, using available genomic data from four sequenced species. We developed primer pairs for 23 markers that are designed to amplify across several families. The discomark program will enhance the development of new nDNA markers by providing a streamlined, automated approach to perform genome‐scale scans for phylogenetic markers. The program is written in Python, released under a public licence (GNU GPL version 2), and together with a manual and example data set available at: https://github.com/hdetering/discomark.