Learning bilingual collocations by word-level sorting

Haruno, Masahiko; Ikehara, Satoru; Yamazaki, Takefumi

doi:10.3115/992628.992719

Cited by 19 publications

(14 citation statements)

References 12 publications

(8 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…There has been a growing interest in corpus-based approaches which retrieve collocations from large corpora (Nagao and Mori, 1994), (Kupiec, 1993), (Fung, 1995), (Kitamura and Matsumoto, 1996), (Smadja, 1993), (Smadja et al, 1996), (Haruno et al, 1996). Although these approaches achieved good results for the task considered, most of them aim to extract fixed collocations, mainly noun phrases, and require the information which is dependent on each language such as dictionaries and parts of speech.…”

Section: Introductionmentioning

confidence: 99%

Retrieving collocations by co-occurrences and word order constraints

Shimohata

Sugio

Nagata

1997

Proceedings of the Eighth Conference on European Chapter of the Association for Computational Linguistics -

View full text Add to dashboard Cite

In this paper, we describe a method for automatically retrieving collocations from large text corpora. This method retrieve collocations in the following stages: 1) extracting strings of characters as units of collocations 2) extracting recurrent combinations of strings in accordance with their word order in a corpus as collocations. Through the method, various range of collocations, especially domain specific collocations, are retrieved. The method is practical because it uses plain texts without any information dependent on a language such as lexical knowledge and parts of speech.

show abstract

Section: Introductionmentioning

confidence: 99%

Retrieving collocations by co-occurrences and word order constraints

Shimohata

Sugio

Nagata

1997

Proceedings of the Eighth Conference on European Chapter of the Association for Computational Linguistics -

View full text Add to dashboard Cite

show abstract

“…This method retrieves fixed collocations with high accuracy but may ignore collocations of exceptional types. Haruno et al (1996) constructed collocations by iteratively combining a couple of strings 5 of high mutual information. But the mutual information is estimated inadequately low when the cohesiveness between the two strings is greatly different.…”

Section: Related Workmentioning

confidence: 99%

“…There has been growing interest in corpus-based approaches that retrieve collocations from large corpora (Nagao, Makoto, and Mori 1994;Ikehara, Shirai, and Uchino 1996;Kupiec 1993;Fung 1995;Kitamura and Matsumoto 1996;Smadja 1993;Smadja, McKeown, and Hatzivassiloglou 1996;and Haruno, Ikehara, and Yamazaki 1996). As collocations have a large variety of forms, these approaches focus on fixed collocations depending on their points of view.…”

Section: Introductionmentioning

confidence: 99%

Retrieving Domain‐Specific Collocations by Co‐occurrences and Word Order Constraints

Shimohata

Sugio

Nagata

1999

Computational Intelligence

View full text Add to dashboard Cite

In this paper, we describe a method for automatically retrieving collocations from large text corpora. This method comprises the following stages: (1) extracting strings of characters as units of collocations, and (2) extracting recurrent combinations of strings as collocations. Through this method, various types of domain-specific collocations can be retrieved simultaneously. This method is practical because it uses plain text with no specific-languagedependent information, such as lexical knowledge and parts of speech. Experimental results using English and Japanese text corpora show that the method is equally applicable to both languages.

show abstract

“…Since the number of Japanese articles is far greater than that of English articles, this rate with Japanese index terms becomes lower for the similarity lower bounds L d ≤ 0.4. 7 It is also very important to note that the results of this paper can be easily improved by employing more sophisticated techniques of estimating bilingual compound term correspondences from parallel corpora (e.g., [2]), especially in the performance of selecting appropriate monolingual compound terms in each language.…”

Section: Related Workmentioning

confidence: 99%

Semi-automatic Compilation of Bilingual Lexicon Entries from Cross-Lingually Relevant News Articles on WWW News Sites

Utsuro

Horiuchi

Chiba

et al. 2002

Machine Translation: From Research to Real Users

View full text Add to dashboard Cite

Abstract. For the purpose of overcoming resource scarcity bottleneck in corpus-based translation knowledge acquisition research, this paper takes an approach of semi-automatically acquiring domain specific translation knowledge from the collection of bilingual news articles on WWW news sites. This paper presents results of applying standard co-occurrence frequency based techniques of estimating bilingual term correspondences from parallel corpora to relevant article pairs automatically collected from WWW news sites. The experimental evaluation results are very encouraging and it is proved that many useful bilingual term correspondences can be efficiently discovered with little human intervention from relevant article pairs on WWW news sites.

show abstract

Learning bilingual collocations by word-level sorting

Cited by 19 publications

References 12 publications

Retrieving collocations by co-occurrences and word order constraints

Retrieving collocations by co-occurrences and word order constraints

Retrieving Domain‐Specific Collocations by Co‐occurrences and Word Order Constraints

Semi-automatic Compilation of Bilingual Lexicon Entries from Cross-Lingually Relevant News Articles on WWW News Sites

Contact Info

Product

Resources

About