2011
DOI: 10.1177/0165551511404867
|View full text |Cite
|
Sign up to set email alerts
|

Double-pass clustering technique for multilingual document collections

Abstract: It is often necessary to categorize automatically multilingual document sets, in which documents written in a variety of languages are included, into topically homogeneous subsets, such as when applying an automatic summarization system for multilingual news articles. However, there have been few studies on multilingual document clustering to date. In particular, it is not known whether clustering techniques are effective in medium- or large-scale multilingual document sets. For scalability, techniques should … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2013
2013
2015
2015

Publication Types

Select...
3
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(2 citation statements)
references
References 21 publications
0
2
0
Order By: Relevance
“…Its fundamental purpose is to determine the similarity between two documents written in different languages. Kishida applied double-pass algorithm to cluster multi-lingual documents – English, French, German and Italian news articles – for document translation [29]. Either the text-translation-based approach or the index-set-mapping approach could be used to perform this task.…”
Section: Literature Reviewmentioning
confidence: 99%
“…Its fundamental purpose is to determine the similarity between two documents written in different languages. Kishida applied double-pass algorithm to cluster multi-lingual documents – English, French, German and Italian news articles – for document translation [29]. Either the text-translation-based approach or the index-set-mapping approach could be used to perform this task.…”
Section: Literature Reviewmentioning
confidence: 99%
“…43 In the context of CLDC, popular approaches consist in exploring word co-occurrence statistics within parallel/comparable corpora. 18,23,35,45 Recent works improved clustering performance by aligning terms from di®erent languages at topic-level. 4,27,29,41 Nonetheless, cross-lingual topic alignment still remains an open challenge.…”
Section: Introductionmentioning
confidence: 99%