ddradseqtools: a software package for in silico simulation and testing of double‐digest RADseq experiments
The International Journal of Health Planning and Management
Published online on July 12, 2016
Abstract
Double‐digested RADseq (ddRADseq) is a NGS methodology that generates reads from thousands of loci targeted by restriction enzyme cut sites, across multiple individuals. To be statistically sound and economically optimal, a ddRADseq experiment has a preliminary design stage that needs to consider issues related to the selection of enzymes, particular features of the genome of the focal species, possible modifications to the library construction protocol, coverage needed to minimize missing data, and the potential sources of error that may impact upon the coverage. We present ddradseqtools, a software package to help ddRADseq experimental design by (i) the generation of in silico double‐digested fragments; (ii) the construction of modified ddRADseq libraries using adapters with either one or two indexes and degenerate base regions (DBRs) to quantify PCR duplicates; and (iii) the initial steps of the bioinformatics preprocessing of reads. ddradseqtools generates single‐end (SE) or paired‐end (PE) reads that may bear SNPs and/or indels. The effect of allele dropout and PCR duplicates on coverage is also simulated. The resulting output files can be submitted to pipelines of alignment and variant calling, to allow the fine‐tuning of parameters. The software was validated with specific tests for the correct operability of the program. The correspondence between in silico settings and parameters from ddRADseq in vitro experiments was assessed to provide guidelines for the reliable performance of the software. ddradseqtools is cost‐efficient in terms of execution time, and can be run on computers with standard CPU and RAM configuration.