Authorship attribution, constructed languages, and the psycholinguistics of individual variation

Abstract‘Authorship attribution’, the problem of determining the author (or the author's attributes, such as gender, age, native language, or other characteristics) by examining the writing style of an unknown work, is an important problem in applied linguistics. The theory of authorship attribution is relatively straightforward: language is an underspecified system, and people can pick and choose among several different ways to describe the same thing. These choices, in turn, become habituated and can be identified as persistent patterns of an individual or group of writers.One important psycholinguistic underpinning to this solution is the universal existence (in natural languages) of so-called “marker words” or ‘function words,’–little, closed-class words that do not carry much semantics but instead denote relationships between content words. Because these words are so lightly processed, writers/speakers can choose among many different near-synonymous forms, and implicitly express their identity in doing so.Do constructed languages have this same degree of near-synonymity? We present the results of a study of authorship attribution using an ad-hoc corpus of fan-written documents in various constructed languages, and show that even artificial languages constructed for artistic purposes, such as Klingon, Na'vi, and Elvish, permit this type of analysis. This indicates that even constructed languages tend to be psycholinguistically plausible.