Authorship Analytics

Synopsis

Authorship analytics (also: authorship identification) deals with the disclosure of a text's authorship using stylometric methods and features. The basic form of forensic authorship analysis and also the earliest form of computational authorship identification is authorship attribution. Given is both a text of unknown or disputed authorship and several texts whose respective authors are known. The known authors form a closed set of (up to thousands of) candidate authors with the questioned author among them. The task is to assign the unknown text to the most likely author from the candidate set.

Clearly to be distinguished is the problem of authorship verification. Given are two texts, and the task is do determine whether or not both texts are written by the same author. This problem may look easier than the multiclass assignment problem of authorship attribution—however, the opposite is true: authorship verification is a true one-class classification problem, and each attribution problem can be reduced to one or multiple verification problem(s). In one-class classification, one is given a target class for which a certain number of examples exist. Objects outside the target class are called outliers, and the one-class classification task is to tell apart outliers from target class members. Within authorship verification, the target class is comprised of writing examples of a certain author A, where a piece of text written by any other author B is an outlier.

Other applications of authorship analytics include author profiling, the creation of author profiles from texts based on psycho-linguistic features, and author masking (also: authorship obfuscation), the obstruction of authorship identification by targeted text manipulation.

With their innovative unmasking algorithm, Koppel and Schler (2004) were the first to introduce an effective approach for tackling the difficult verification problem. A variant of this approach optimized for short texts (Bevendorff et al., 2019) is demonstrated in a web demo. [demo]

People

Publications