Articles from Digital Scholarship in the Humanities (formerly LLC)

Discourse lexicon induction for multiple languages and its use for gender profiling

AbstractWe propose a novel way to create categorized discourse lexicons for multiple languages. We combine information from the Penn Discourse Treebank with statistical machine translation techniques on the Europarl corpus. Using gender profiling as an application, we evaluate our approach by comparing it with an approach using features from a knowledge-based lexicon and with an Rhetorical structure theory (RST) discourse parser. Our experiments are performed on corpora for three languages (English, Dutch, and German) in two genres (news and blogs).

Autonomous learning of productive vocabulary in the EFL context: An action research approach

AbstractThe present study exemplifies an action research-based approach to developing learner autonomy in learning productive vocabulary in an English as a foreign language (EFL) setting. We conducted two cycles of teaching actions as interventions to solve immediate learning problems. These actions involved Evernote-aided learning and activities of word guessing, gap noticing, and phonetic drilling. The results of vocabulary tests and interviews were analysed to measure and verify the outcomes of the interventions.

On comparing and clustering the alternatives of love in Saadi's lyric poems (Ghazals)

AbstractLove is the most significant subject of mystical path. This study explores all lines of Saadi’s lyric poems. Different words applied as alternatives of love were classified in twelve categories. To compare the frequency of different categories and words that were used as alternatives of love, the chi-square goodness-of-fit test was separately used. Then, using K-means clustering method, these alternatives were clustered in three categories (high frequency, medium frequency, and low frequency).

Genre-based writing instruction blended with an online writing tutorial system for the development of academic writing

AbstractAcademic writing training in various forms has been developed to enhance writing knowledge and skills for graduate students at universities. However, few studies have targeted comparative learning analysis of the Introduction and Method sections in terms of genre structure and language use with the support of technology in the humanities and social sciences contexts.

Vernacularization in Medieval Chinese: A quantitative study on classifiers, demonstratives, and copulae in the Chinese Buddhist Canon

AbstractWhile studies on diachronic Chinese syntax have identified a number of linguistic changes in Medieval Chinese, they have mostly been underpinned by qualitative analyses. In the most large-scale quantitative analysis to-date, this article investigates changes in the use of classifiers, demonstratives, and copulae. Our analysis, based on the Chinese Buddhist Canon, examines over 40 million characters in texts spanning a millennium.

How are ‘immigrant workers’ represented in Korean news reporting?—A text mining approach to critical discourse analysis

AbstractThe present study explores the usefulness of a text mining approach to investigating the representation of minorities in news reporting. The question is popular for scholars working in the realm of critical discourse analysis (CDA). Their typical approach is qualitative, which involves dissecting a small number of texts at the microlinguistic level. This approach has over the years come under severe criticisms for the lack of objective and reliable empirical evidence it produces for the sweeping claims it makes about the relationship between language and social structures.

Uncovering gender bias in newspaper coverage of Irish politicians using machine learning

AbstractThis article presents a text-analytic approach to analysing media content for evidence of gender bias. Irish newspaper content is examined using machine learning and natural language processing techniques. Systematic differences in the coverage of male and female politicians are uncovered, and these differences are analysed for evidence of gender bias. A corpus of newspaper coverage of politicians over a 15-year period was created. Features of the text were extracted and patterns differentiating coverage of male and female politicians were identified using machine learning.

Visualizing the knowledge domain of embodied language cognition: A bibliometric review

AbstractIn the present study 2,180 papers related to embodied cognition in the framework of linguistics were reviewed by using the bibliometric approach. The bibliographic records were collected from the Web of Science (Thomson Reuters) from 1992 to 2016 and were composed of a core data set and an expanded data set by topic searching and citation expansion. Document co-citation analysis, citation burst detection, and betweenness centrality measurement were conducted to explore and determine the thematic patterns, emerging trends, and critical articles of the knowledge domain.

Evaluation of text representation schemes and distance measures for authorship linking

AbstractBased on n text excerpts, the authorship linking task is to determine a way to link pairs of documents written by the same person together. This problem is closely related to authorship attribution questions, and its solution can be used in the author clustering task. However, no training information is provided and the solution must be unsupervised. To achieve this, various text representation strategies can be applied, such as characters, punctuation symbols, or letter n-grams as well as words, lemmas, Part-Of-Speech (POS) tags, and sequences of them.