Polymorphism‐aware protein databases – a prerequisite for an unbiased proteomic analysis of natural populations

Kathrin A. Otte, Christian Schlötterer

The International Journal of Health Planning and Management

Published online on March 30, 2017

Abstract

Recent technological advances have increased the throughput of proteomics, facilitating the characterization of molecular phenotypes on the population level, thus bearing the potential to complement transcriptomic analyses. Reference protein databases are crucial for the analysis and quantification, because only peptides in the protein database can be identified. Any peptide carrying an amino acid variant cannot be identified. Because most proteomic studies, even of natural populations, do not account for polymorphisms, we analysed the influence of variant peptides on quantitative proteomic analyses. We used transcriptomic and proteomic data of two Drosophila melanogaster genotypes and identified genotype‐specific variants from RNA‐seq data. We introduce a simple pipeline to include these variants in a polymorphism‐aware protein database and compared the results to an unmodified reference database. The polymorphism‐aware database not only identifies more peptides, but the quantitative values also changed when peptide variants were included. We conclude that proteomic quantification is likely to be biased, in particular for small genes, when polymorphisms are being ignored. Polymorphism‐aware databases may be therefore a key step towards improved proteomic data analyses, especially for the analysis of pooled individuals and the comparison of population samples.