Articles from Digital Scholarship in the Humanities (formerly LLC)

Vernacularization in Medieval Chinese: A quantitative study on classifiers, demonstratives, and copulae in the Chinese Buddhist Canon

AbstractWhile studies on diachronic Chinese syntax have identified a number of linguistic changes in Medieval Chinese, they have mostly been underpinned by qualitative analyses. In the most large-scale quantitative analysis to-date, this article investigates changes in the use of classifiers, demonstratives, and copulae. Our analysis, based on the Chinese Buddhist Canon, examines over 40 million characters in texts spanning a millennium.

How are ‘immigrant workers’ represented in Korean news reporting?—A text mining approach to critical discourse analysis

AbstractThe present study explores the usefulness of a text mining approach to investigating the representation of minorities in news reporting. The question is popular for scholars working in the realm of critical discourse analysis (CDA). Their typical approach is qualitative, which involves dissecting a small number of texts at the microlinguistic level. This approach has over the years come under severe criticisms for the lack of objective and reliable empirical evidence it produces for the sweeping claims it makes about the relationship between language and social structures.

Uncovering gender bias in newspaper coverage of Irish politicians using machine learning

AbstractThis article presents a text-analytic approach to analysing media content for evidence of gender bias. Irish newspaper content is examined using machine learning and natural language processing techniques. Systematic differences in the coverage of male and female politicians are uncovered, and these differences are analysed for evidence of gender bias. A corpus of newspaper coverage of politicians over a 15-year period was created. Features of the text were extracted and patterns differentiating coverage of male and female politicians were identified using machine learning.

Visualizing the knowledge domain of embodied language cognition: A bibliometric review

AbstractIn the present study 2,180 papers related to embodied cognition in the framework of linguistics were reviewed by using the bibliometric approach. The bibliographic records were collected from the Web of Science (Thomson Reuters) from 1992 to 2016 and were composed of a core data set and an expanded data set by topic searching and citation expansion. Document co-citation analysis, citation burst detection, and betweenness centrality measurement were conducted to explore and determine the thematic patterns, emerging trends, and critical articles of the knowledge domain.

Evaluation of text representation schemes and distance measures for authorship linking

AbstractBased on n text excerpts, the authorship linking task is to determine a way to link pairs of documents written by the same person together. This problem is closely related to authorship attribution questions, and its solution can be used in the author clustering task. However, no training information is provided and the solution must be unsupervised. To achieve this, various text representation strategies can be applied, such as characters, punctuation symbols, or letter n-grams as well as words, lemmas, Part-Of-Speech (POS) tags, and sequences of them.

Research report on the adequacy of SciE-Lex as a lexicographic tool for the writing of biomedical papers in English

AbstractThe widespread use of English in science and scholarship has stressed the increasing need for reference tools which provide non-native, especially junior, researchers with useful information about the collocational patterns as well as conventionalized phraseology of non-technical words prototypical of specialized discourses.

From a distance ‘You might mistake her for a man’: A closer reading of gender and character action in Jane Eyre, The Law and the Lady, and A Brilliant Woman1

AbstractThis research examines and contributes to recent work by Matthew Jockers and Gabi Kirilloff on the relationship between gender and action in the nineteenth-century novel. Jockers and Kirilloff use dependency parsing to extract verb and gendered pronoun pairs (‘he said’, ‘she walked’, etc.). They then build a classification model to predict the gender of a pronoun based on the verb being performed.

Is Starnone really the author behind Ferrante?

AbstractElena Ferrante is a pen name known worldwide, authoring novels such as the bestseller My Brilliant Friend. A recent study indicates that the true author behind these books is probably Domenico Starnone. This study aims to select a set of approved authorship methods and appropriate feature sets to prove, with as much certainty as possible, that this conclusion is correct. To achieve this, a corpus of contemporary Italian novels has been generated, containing 150 books written by forty authors (including seven by Ferrante).

Quantitative methods for the analysis of medieval calendars

AbstractThe article explores the uses of quantitative approaches used in textual scholarship in studying large amounts of medieval hand-written calendars. Calendars are exceedingly numerous among medieval manuscript sources but have been studied surprisingly little in spite of the insights they offer into the values and ideals of the communities using and updating them. Moreover, the study of a large number of calendars helps shape patterns of cultural contacts, for instance.

A preordering model based on phrasal dependency tree

AbstractIntelligent machine translation (MT) is becoming an important field of research and development as the need for translations grows. Currently, the word reordering problem is one of the most important issues of MT systems. To tackle this problem, we present a source-side reordering method using phrasal dependency trees, which depict dependency relations between contiguous non-syntactic phrases. Reordering elements are automatically learned from a reordered phrasal dependency tree bank and are utilized to produce a source reordering lattice.