Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004
DOI: 10.1145/1008992.1009021
|View full text |Cite
|
Sign up to set email alerts
|

Resource selection for domain-specific cross-lingual IR

Abstract: An under-explored question in cross-language information retrieval (CLIR) is to what degree the performance of CLIR methods depends on the availability of high-quality translation resources for particular domains. To address this issue, we evaluate several competitive CLIR methods -with different training corpora -on test documents in the medical domain. Our results show severe performance degradation when using a general-purpose training corpus or a commercial machine translation system (SYSTRAN), versus a do… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
10
0

Year Published

2004
2004
2020
2020

Publication Types

Select...
3
3
2

Relationship

1
7

Authors

Journals

citations
Cited by 13 publications
(10 citation statements)
references
References 12 publications
0
10
0
Order By: Relevance
“…For medical terminology, as well as for other sublanguages, non-specialized multilingual lexicons (based on WORD-NET) or commercial machine translation systems offer limited support only [6,16]. We optimize the lexical coverage by limiting the dictionary to semantically relevant subwords of the medical domain.…”
Section: Discussionmentioning
confidence: 99%
“…For medical terminology, as well as for other sublanguages, non-specialized multilingual lexicons (based on WORD-NET) or commercial machine translation systems offer limited support only [6,16]. We optimize the lexical coverage by limiting the dictionary to semantically relevant subwords of the medical domain.…”
Section: Discussionmentioning
confidence: 99%
“…tasks, such as feature selection [Yang and Lui 1999;Dumais and Chen 2000] and bilingual term correspondence [Utsuro et al 2003;Rogati and Yang 2004]. Let us take a look at the Reuters '96 hierarchy.…”
Section: Estimating Category Correspondencesmentioning
confidence: 99%
“…In general, a parallel corpus will be most useful if it is used to implement cross-language retrieval of documents that are in a similar domain to the parallel corpus. Recent work shows that significant effectiveness can be obtained ifthe correct domain is selected [Rogati and Yang, 2004]. We discuss parallel corpus CUR techniques in Section 4.3.2.1.…”
Section: Resourcesmentioning
confidence: 99%