2012
DOI: 10.1108/02640471211221395
|View full text |Cite
|
Sign up to set email alerts
|

Bilingual terminology extraction using multi‐level termhood

Abstract: Purpose -Terminology is the set of technical words or expressions used in specific contexts, which denotes the core concept in a formal discipline and is usually applied in the fields of machine translation, information retrieval, information extraction and text categorization, etc. Bilingual terminology extraction plays an important role in the application of bilingual dictionary compilation, bilingual ontology construction, machine translation and cross-language information retrieval etc. This paper aims to … Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3

Citation Types

0
3
0

Year Published

2018
2018
2020
2020

Publication Types

Select...
5

Relationship

0
5

Authors

Journals

citations
Cited by 7 publications
(3 citation statements)
references
References 17 publications
0
3
0
Order By: Relevance
“…Some of these approaches rely on the existence of a seed lexicon (Semmar, 2018; Tsvetkov and Wintner, 2010; Xu et al , 2015) or existing translation memories and phrase tables (Oliver, 2017), while in some cases the existence of additional resources, in addition to the input corpus, is not required (Arcan et al , 2017; Bouamor et al , 2012; Garabík and Dimitrova, 2015; Naguib, 2016). Some approaches require parallel sentence-aligned data (Arcan et al , 2017; Bouamor et al , 2012; Garabík and Dimitrova, 2015; Semmar, 2018; Zhang and Wu, 2012), while others perform the extraction on comparable corpora (Hazem and Morin, 2016; Pinnis et al , 2012; Xu et al , 2015). The technique employed in Naguib (2016), used groups of aligned sentences (verses).…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Some of these approaches rely on the existence of a seed lexicon (Semmar, 2018; Tsvetkov and Wintner, 2010; Xu et al , 2015) or existing translation memories and phrase tables (Oliver, 2017), while in some cases the existence of additional resources, in addition to the input corpus, is not required (Arcan et al , 2017; Bouamor et al , 2012; Garabík and Dimitrova, 2015; Naguib, 2016). Some approaches require parallel sentence-aligned data (Arcan et al , 2017; Bouamor et al , 2012; Garabík and Dimitrova, 2015; Semmar, 2018; Zhang and Wu, 2012), while others perform the extraction on comparable corpora (Hazem and Morin, 2016; Pinnis et al , 2012; Xu et al , 2015). The technique employed in Naguib (2016), used groups of aligned sentences (verses).…”
Section: Related Workmentioning
confidence: 99%
“…In Irvine and Callison-Burch (2016), the authors performed two experiments, the first one relying on the existence of a bilingual dictionary with no parallel texts and the second one requiring only the existence of a small amount of parallel data. Bilingual lexica were compiled for different language pairs: English/French (Bouamor et al , 2012; Hakami and Bollegala, 2017; Semmar, 2018), English/Spanish (Oliver, 2017), English/Arabic (Naguib, 2016), English/Italian and English/German (Arcan et al , 2017), English/Slovene (Vintar and Fišer, 2008), English/Croatian, Latvian, Lithuanian and Romanian (Pinnis et al , 2012), English/Chinese (Xu et al , 2015; Zhang and Wu, 2012), English/Hebrew (Tsvetkov and Wintner, 2010), English/Italian (Arcan et al , 2017), Slovak/Bulgarian (Garabík and Dimitrova, 2015), Serbian/English (Krstev et al , 2018) and so on.…”
Section: Related Workmentioning
confidence: 99%
“…It is used for lexicon creation, acquisition of novel terms, text classification, text indexing, machine-assisted translation and other NLP tasks. Different approaches to MWT extraction, linguistics- or statistics-based (or both), have already been published recently (Cram and Daille, 2016; Sclano and Velardi, 2007; Verberne et al , 2016; Vivaldi and Rodríguez, 2007; Yin et al , 2016; Zhang and Wu, 2012). Most of the methods used for MWT extraction today are hybrid; that is, they usually integrate statistical information, such as frequencies of n-grams and collocations, with linguistic information, such as syntactic patterns of expressions.…”
Section: Related Workmentioning
confidence: 99%