A simple and efficient algorithm for authorship verification
Journal of the American Society for Information Science and Technology
Published online on January 11, 2016
Abstract
This paper describes and evaluates an unsupervised and effective authorship verification model called Spatium‐L1. As features, we suggest using the 200 most frequent terms of the disputed text (isolated words and punctuation symbols). Applying a simple distance measure and a set of impostors, we can determine whether or not the disputed text was written by the proposed author. Moreover, based on a simple rule we can define when there is enough evidence to propose an answer or when the attribution scheme is unable to make a decision with a high degree of certainty. Evaluations based on 6 test collections (PAN CLEF 2014 evaluation campaign) indicate that Spatium‐L1 usually appears in the top 3 best verification systems, and on an aggregate measure, presents the best performance. The suggested strategy can be adapted without any problem to different Indo‐European languages (such as English, Dutch, Spanish, and Greek) or genres (essay, novel, review, and newspaper article).