Ekaterina Lapshinova-Koltunski scite author profile

Ekaterina Lapshinova-Koltunski

5Publications

95Citation Statements Received

85Citation Statements Given

How they've been cited

116

How they cite others

Affiliations

University of Hildesheim, Saarland University

Publications

Order By: Most citations

Information Density and Quality Estimation Features as Translationese Indicators for Human Translation Classification

Rubino

Lapshinova-Koltunski

Genabith

2016

View full text Add to dashboard Cite

This paper introduces information density and machine translation quality estimation inspired features to automatically detect and classify human translated texts. We investigate two settings: discriminating between translations and comparable originally authored texts, and distinguishing two levels of translation professionalism. Our framework is based on delexicalised sentence-level dense feature vector representations combined with a supervised machine learning approach. The results show state-of-the-art performance for mixed-domain translationese detection with information density and quality estimation based features, while results on translation expertise classification are mixed.

show abstract

Cross-linguistic analysis of discourse variation across registers

Kunz¹,

Lapshinova-Koltunski

2015

NJES

View full text Add to dashboard Cite

The present study deals with variation in discourse relations in different registers of English and German. Our previous analyses have been concerned with the systemic contrasts between English and German, cf. Kunz & Steiner (2013 a/b), Kunz & Lapshinova (to appear) and have addressed some cross-linguistic differences with regard to textual realizations of selected subtypes of cohesion. In our current work, our focus is on the empirical analysis of cross-linguistic variation between registers. In order to obtain a more comprehensive picture, we investigate three main types of cohesion in combination: co-reference, substitution and conjunction and their subtypes, cf. Halliday & Hasan (1976). We extract instantiations of cohesive devices from an English-German corpus of spoken and written registers. The data is analyzed with statistical procedures which show that subcorpora can be grouped along particular combinations of cohesive devices.

show abstract

The linguistic construal of disciplinarity: A data‐mining approach using register features

Teich

Degaetano-Ortlieb

Fankhauser

et al. 2015

Asso for Info Science & Tech

View full text Add to dashboard Cite

We analyze the linguistic evolution of selected scientific disciplines over a 30-year time span (1970s to 2000s). Our focus is on four highly specialized disciplines at the boundaries of computer science that emerged during that time: computational linguistics, bioinformatics, digital construction, and microelectronics. Our analysis is driven by the question whether these disciplines develop a distinctive language use-both individually and collectively-over the given time period. The data set is the English Scientific Text Corpus (SCITEX), which includes texts from the 1970s/1980s and early 2000s. Our theoretical basis is register theory. In terms of methods, we combine corpus-based methods of feature extraction (various aggregated features [part-of-speech based], n-grams, lexico-grammatical patterns) and automatic text classification. The results of our research are directly relevant to the study of linguistic variation and languages for specific purposes (LSP) and have implications for various natural language processing (NLP) tasks, for example, authorship attribution, text mining, or training NLP tools.

show abstract

A Pronoun Test Suite Evaluation of the English–German MT Systems at WMT 2018

Guillou¹,

Hardmeier²,

Lapshinova-Koltunski³

et al. 2018

View full text Add to dashboard Cite

We evaluate the output of 16 English-to-German MT systems with respect to the translation of pronouns in the context of the WMT 2018 competition. We work with a test suite specifically designed to assess system quality in various fine-grained categories known to be problematic. The main evaluation scores come from a semi-automatic process, combining automatic reference matching with extensive manual annotation of uncertain cases. We find that current NMT systems are good at translating pronouns with intra-sentential reference, but the inter-sentential cases remain difficult. NMT systems are also good at the translation of event pronouns, unlike systems from the phrase-based SMT paradigm. No single system performs best at translating all types of anaphoric pronouns, suggesting unexplained random effects influencing the translation of pronouns with NMT.

show abstract

Beyond Identity Coreference: Contrasting Indicators of Textual Coherence in English and German

Kunz

Lapshinova-Koltunski

Martínez

2016

View full text Add to dashboard Cite

This paper focuses on the interaction of chains of coreference identity with other types of relations, comparing English and German data sets in terms of language, mode (written vs. spoken) and register. We first describe the types of coreference and the chain features analysed as indicators of textual coherence and topic continuity. After sketching the feature categories under analysis and the methods used for statistical evaluation, we present the findings from our analysis and interpret them in terms of the contrasts mentioned above. We will also show that for some registers, coreference types other than identity are of great importance.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.