Minimum sample sizes for population genomics: an empirical study from an Amazonian plant species
The International Journal of Health Planning and Management
Published online on February 10, 2017
Abstract
High‐throughput DNA sequencing facilitates the analysis of large portions of the genome in nonmodel organisms, ensuring high accuracy of population genetic parameters. However, empirical studies evaluating the appropriate sample size for these kinds of studies are still scarce. In this study, we use double‐digest restriction‐associated DNA sequencing (ddRADseq) to recover thousands of single nucleotide polymorphisms (SNPs) for two physically isolated populations of Amphirrhox longifolia (Violaceae), a nonmodel plant species for which no reference genome is available. We used resampling techniques to construct simulated populations with a random subset of individuals and SNPs to determine how many individuals and biallelic markers should be sampled for accurate estimates of intra‐ and interpopulation genetic diversity. We identified 3646 and 4900 polymorphic SNPs for the two populations of A. longifolia, respectively. Our simulations show that, overall, a sample size greater than eight individuals has little impact on estimates of genetic diversity within A. longifolia populations, when 1000 SNPs or higher are used. Our results also show that even at a very small sample size (i.e. two individuals), accurate estimates of FST can be obtained with a large number of SNPs (≥1500). These results highlight the potential of high‐throughput genomic sequencing approaches to address questions related to evolutionary biology in nonmodel organisms. Furthermore, our findings also provide insights into the optimization of sampling strategies in the era of population genomics.