Creating a richly annotated corpus of papyrological Greek: The possibilities of natural language processing approaches to a highly inflected historical language

AbstractThis article describes a first attempt to annotate the full Greek papyrus corpus automatically for linguistic information. It gives an overview of existing work on Ancient Greek and analyzes the typical problems one encounters when using natural language processing techniques on (1) a historical corpus of (2) a highly inflectional language (as opposed to the more analytic present-day English) and offers solutions to them, testing several different approaches.

Towards an ontology-based iconography

AbstractThis article describes work undertaken at the Warburg Institute in London into the definition of machine-readable ontologies for the identification of iconographic subjects. Iconography, a descriptive discipline concerned with the identification of the content or subject of an image, is a core component of the wider discipline of iconology, the study of the meanings of images in their cultural or historical contexts.

Toward Kurdish language processing: Experiments in collecting and processing the AsoSoft text corpus

AbstractIn this article, we introduce the first Kurdish text corpus for Central Kurdish (Sorani) branch, called AsoSoft text corpus. Kurdish language, which is spoken by more than 30 million people, has various dialects. As one of the two main branches of Kurdish, Central Kurdish is the formal dialect of Kurdish literature. AsoSoft text corpus is of size 188 million tokens and has been collected mostly from Web sites, published books, and magazines. The corpus has been normalized and converted into Text Encoding Initiative XML format.

Comparing multiple categories of feature selection methods for text classification

AbstractSelecting effective features from data sets is a particularly important part in text classification, data mining, pattern recognition, and artificial intelligence. Feature selection (FS) is capable of excluding irrelevant features for the classification task and reducing the dimensionality of data sets, which help us better understand data. Through FS selection, the performance of machine learning techniques is improved, and computation requirement is minimized. Thus far, a large number of FS methods have been proposed, whereas the most practically effective one has not been found.

PaperMiner—a real-time spatiotemporal visualization for newspaper articles

AbstractIn 2005, the National Library of Australia (NLA) began a pilot project to selectively digitize back issues of major Australian newspapers to provide free public access to over 60 million digitized newspaper articles, dating from the first years of Australian colonization to the early 1960s. Trove, a faceted search engine maintained by NLA, provides access to this very large collection.

Digital dance scholarship: Biomechanics and culturally situated dance analysis

AbstractThis article explores the intersection of biomechanics and culturally situated dance scholarship. We focus on ‘Sendratari Ramayana’, a 50-year-old dance form heavily influenced by classical Javanese dance traditions dating back to the 19th century. We used a full-body plug-in gait model to record differences in character typology—a key concern of Javanese dance scholarship. The results are presented through online visualizations and analyzed quantitatively.

The visual digital turn: Using neural networks to study historical images

AbstractDigital humanities research has focused primarily on the analysis of texts. This emphasis stems from the availability of technology to study digitized text. Optical character recognition allows researchers to use keywords to search and analyze digitized texts. However, archives of digitized sources also contain large numbers of images. This article shows how convolutional neural networks (CNNs) can be used to categorize and analyze digitized historical visual sources.

Corpus Approaches to Contemporary British Speech: Sociolinguistics Studies of the Spoken BNC2014. Voclav Brezina, Robbie Love and Karin Aijmer (eds.)

Corpus Approaches to Contemporary British Speech: Sociolinguistics Studies of the Spoken BNC2014. BrezinaVoclav, LoveRobbie and AijmerKarin (eds.). London/New York: Routledge, 2018, VII+PP. 264. ISBN: 978-1-138-28727-3. £115.00 (hardback).


Subscribe to ADHO RSS