Abstract. This paper describes the framework of the StatCan Daily Translation Extraction System (SDTES), a computer system that maps and compares webbased translation texts of Statistics Canada (StatCan) news releases in the StatCan publication The Daily. The goal is to extract translations for translation memory systems, for translation terminology building, for cross-language information retrieval and for corpus-based machine translation systems. Three years of officially published statistical news release texts at www.statcan.ca were collected to compose the StatCan Daily data bank. The English and French texts in this collection were roughly aligned using the Gale-Church statistical algorithm. After this, boundary markers of text segments and paragraphs were adjusted and the Gale-Church algorithm was run a second time for a more fine-grained text segment alignment. To detect misaligned areas of texts and to prevent mis-matched translation pairs from being selected, key textual and structural properties of the mapped texts were automatically identified and used as anchoring features for comparison and misalignment detection. Results show that SDTES is very efficient in extracting translations from Daily texts, and very accurate in identifying mismatched translations. With parameters tuned, the text-mapping part can be used to align officially published bilingual government web-site materials; and the text-comparing component can be applied in pre-publication translation quality control and in evaluating the results of statistical machine translation systems.
This article studies the issue of argument realization by preposition structures. By examining the preposition structures that are marked as frame elements in FrameNet, the article attempts to give corpus-based attestations to the hypothesized link between deep semantic arguments and their surface syntactic representations. Problems addressed in this article include how argument realization by preposition structures can be predictable from the target lexical unit and the frame it evokes, and why some noncentral prepositions get selected in the argument realization options. The investigation is primarily inspired by Fillmore's work in frame semantics. The source data for this study is derived from a preposition knowledge base that we have recently built by extracting all the semantically annotated preposition structures in FrameNet. The analysis shows that while there are various semanticsyntactic mapping possibilities, for most semantic arguments, the tendency of using central prepositions in their realization expressions is very strong. This is a clear indication that some preposition structures are linked to certain semantic arguments more than they are to others. A similar experiment was conducted using the annotated PropBank corpus to corroborate the supporting evidence found in FrameNet. The results of this study, together with the syntactic-semantic mapping lists of preposition structures can provide raw linguistic data for the study of preposition semantics, lexicography, argument realization, word sense disambiguation, and natural language understanding.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.