Toward a computational history of universities: Evaluating text mining methods for interdisciplinarity detection from PhD dissertation abstracts

AbstractFor the first time, historians of higher education have large data sets of primary sources that reflect the complete output of academic institutions at their disposal. To analyze this unprecedented abundance of digital materials, scholars have access to a large suite of computational methods developed in the field of Natural Language Processing. However, when the intention is to move beyond exploratory studies and use the results of such analyses as quantitative evidences, historians need to take into account the reliability of these techniques.

Stylometric analysis of Early Modern period English plays

AbstractFunction word adjacency networks (WANs) are used to study the authorship of plays from the Early Modern English period. In these networks, nodes are function words and directed edges between two nodes represent the relative frequency of directed co-appearance of the two words. For every analyzed play, a WAN is constructed and these are aggregated to generate author profile networks. We first study the similarity of writing styles between Early English playwrights by comparing the profile WANs.

Spelling variation in historical text corpora: The case of early medieval documentary Latin

AbstractSpelling variation seems to go hand in hand with grammatical variation in certain historical texts. This article presents a method of quantifying spelling variation as a linguistic variable whose relation with relevant grammatical and contextual variables can be statistically measured.

Computer stylometry of C. S. Lewis’s The Dark Tower and related texts

AbstractThis article looks at the provenance of the unfinished novel The Dark Tower, generally attributed to C. S. Lewis. The manuscript was purportedly rescued from a bonfire shortly after Lewis’s death by his literary executor Walter Hooper, but the quality of the text is hardly vintage Lewis. Using computer stylometric programs made available by Eder et al.’s (2016: Stylometry with R: A package for computational text analysis.


Do language combinations affect translators’ stylistic visibility in translated texts?

AbstractThe question of translators’ stylistic visibility in translated texts has been a recurring theme in translation studies. Recently, the employment of state-of-the-art stylometric methods such as multivariate statistical analysis or machine learning techniques has enabled important progress to be made in exploring the problem. Nevertheless, studies are conflicting in their findings. Some find evidence of translators’ stylistic presence, while others fail to do so.

Unsupervised identification of text reuse in early Chinese literature

AbstractText reuse in early Chinese transmitted texts is extensive and widespread, often reflecting complex textual histories involving repeated transcription, compilation, and editing spanning many centuries and involving the work of multiple authors and editors. In this study, a fully automated method of identifying and representing complex text reuse patterns is presented, and the results evaluated by comparison to a manually compiled reference work.

Visual meta-data in qualitative analysis

AbstractDr Anne Luther is a researcher, curator, and software developer whose work examines the contemporary art market and data visualization in qualitative research. She received her PhD from Central Saint Martins College of Art and Design, London, and is currently a researcher at the Department for Modern Art History at the Institute of Art Studies and Historical Urban Studies at TU Berlin and at The Center for Data Arts at The New School in New York.

Mining and discovery of hidden relationships between software source codes and related textual documents

AbstractNormally, software documentations are produced, informally. They are written in the unnatural and non-structural form of the language, such as user manuals, user requirements, design documentation, tutorials, support documentation, and so on. Recent studies show that 61% of software projects are subject to failure or challenges due to an increase in the costs and production time. Various factors may lead to this issue, and one of the major contributing factors is the lack of links between the software's source code and its related documents.