Automatic term extraction is a productive field of research within natural language processing, but it still faces significant obstacles regarding datasets and evaluation, which require manual term annotation. This is an arduous task, made even more difficult by the lack of a clear distinction between terms and general language, which results in low inter-annotator agreement. There is a large need for well-documented, manually validated datasets, especially in the rising field of multilingual term extraction from comparable corpora, which presents a unique new set of challenges. In this paper, a new approach is presented for both monolingual and multilingual term annotation in comparable corpora. The detailed guidelines with different term labels, the domain-and language-independent methodology and the large volumes annotated in three different languages and four different domains make this a rich resource. The resulting datasets are not just suited for evaluation purposes but can also serve as a general source of information about terms and even as training data for supervised methods. Moreover, the gold standard for multilingual term extraction from comparable corpora contains information about term variants and translation equivalents, which allows an in-depth, nuanced evaluation.
Although translation revision plays a crucial role in the production of highquality translations, research into translation revision competence (TRC) is relatively new and underdeveloped compared with research into translation competence (TC). This article addresses that gap by focusing on the validation of the TRC model developed by Robert, Remael and Ureel. Using questionnaires and revision tasks in a pretest-posttest experimental design, we investigated whether a course on revision and editing affected the degree of fairness and tolerance that participants showed when revising others' translations. Analyses of the results showed that the participants in the experimental group did not make fewer unnecessary changes after taking a course on revision and editing. In addition, the types and sizes of the unnecessary changes that they made were not influenced by taking the revision and editing course. However, when exposed to a revision task without clear instructions and context, participants who had taken the course on revision and editing were significantly less categorical when providing post-treatment answers, even though this behaviour was not reflected in their attitudes in the revision tasks. These findings invite further research into the attitudinal component of TRC.
Traditional approaches to automatic term extraction do not rely on machine learning (ML) and select the top n ranked candidate terms or candidate terms above a certain predefined cut-off point, based on a limited number of linguistic and statistical clues. However, supervised ML approaches are gaining interest. Relatively little is known about the impact of these supervised methodologies; evaluations are often limited to precision, and sometimes recall and f1-scores, without information about the nature of the extracted candidate terms. Therefore, the current paper presents a detailed and elaborate analysis and comparison of a traditional, state-ofthe-art system (TermoStat) and a new, supervised ML approach (HAMLET), using the results obtained for the same, manually annotated, Dutch corpus about dressage.
Translation revision (TR) is an important step in the translation workflow. However, translation revision competence (TRC) remains an ill-defined concept. This article addresses that gap by operationalizing TR and by presenting a theoretical TRC model. Subsequently, the article analyses and interprets the results of an empirical pilot study designed to test the presence of two TR subcompetences hypothesized by the TRC model, in an experimental group and a control group of 21 MA language students. The experimental group was given TR training whereas the control group was not. The two subcompetences that were tested using a pretest—posttest experimental design were declarative-procedural knowledge about TR and the procedural strategic revision subcompetence. Both groups of participants replied to questionnaires and performed controlled revision tasks, which were subjected to quantitative statistical analyses. This article provides a detailed analysis of the results and the causes of the limited progress. In addition, it discusses the lessons learnt for both TR training and further research.
When using computer-aided translation systems in a typical, professional translation workflow, there are several stages at which there is room for improvement. The SCATE (Smart Computer-Aided Translation Environment) project investigated several of these aspects, both from a human-computer interaction point of view, as well as from a purely technological side. This paper describes the SCATE research with respect to improved fuzzy matching, parallel treebanks, the integration of translation memories with machine translation, quality estimation, terminology extraction from comparable texts, the use of speech recognition in the translation process, and human computer interaction and interface design for the professional translation environment. For each of these topics, we describe the experiments we performed and the conclusions drawn, providing an overview of the highlights of the entire SCATE project.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.