MetaTOC stay on top of your field, easily

Pedigree reconstruction from SNP data: parentage assignment, sibship clustering and beyond

The International Journal of Health Planning and Management

Published online on

Abstract

Data on hundreds or thousands of single nucleotide polymorphisms (SNPs) provide detailed information about the relationships between individuals, but currently few tools can turn this information into a multigenerational pedigree. I present the r package sequoia, which assigns parents, clusters half‐siblings sharing an unsampled parent and assigns grandparents to half‐sibships. Assignments are made after consideration of the likelihoods of all possible first‐, second‐ and third‐degree relationships between the focal individuals, as well as the traditional alternative of being unrelated. This careful exploration of the local likelihood surface is implemented in a fast, heuristic hill‐climbing algorithm. Distinction between the various categories of second‐degree relatives is possible when likelihoods are calculated conditional on at least one parent of each focal individual. Performance was tested on simulated data sets with realistic genotyping error rate and missingness, based on three different large pedigrees (N = 1000–2000). This included a complex pedigree with overlapping generations, occasional close inbreeding and some unknown birth years. Parentage assignment was highly accurate down to about 100 independent SNPs (error rate <0.1%) and fast (<1 min) as most pairs can be excluded from being parent–offspring based on opposite homozygosity. For full pedigree reconstruction, 40% of parents were assumed nongenotyped. Reconstruction resulted in low error rates (<0.3%), high assignment rates (>99%) in limited computation time (typically <1 h) when at least 200 independent SNPs were used. In three empirical data sets, relatedness estimated from the inferred pedigree was strongly correlated to genomic relatedness.