Spelling variation in historical text corpora: The case of early medieval documentary Latin

AbstractSpelling variation seems to go hand in hand with grammatical variation in certain historical texts. This article presents a method of quantifying spelling variation as a linguistic variable whose relation with relevant grammatical and contextual variables can be statistically measured. Based on the normalization of the non-standard word forms and the subsequent calculation of edit distance between the normalized and attested word forms, the method is applicable to morphologically tagged historical text corpora and is here tested on an early medieval documentary Latin corpus with notable spelling variation. To justify the proposed method, several methodological issues of both philological and technical nature are discussed. The latter part of the article illustrates the potential of the method by way of a case study on the relationship between spelling variation and the use of non-standard prepositions in documentary Latin and by examining the chronological variation of spelling in its historical context. There appears to be a statistically significant dependence between non-standard spelling and the use of non-standard prepositions. It is also argued that the diachronic spelling distribution may be indicative of a spelling reform, in addition to reflecting an already known administrative change relative to scribal practices. Thus, the proposed method of quantifying spelling variation proves to offer interesting insights into the linguistic and historical reality underlying the text corpus.