Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 2007
DOI: 10.1145/1281192.1281211
|View full text |Cite
|
Sign up to set email alerts
|

Cross-language information retrieval using PARAFAC2

Abstract: A standard approach to cross-language information retrieval (CLIR) uses Latent Semantic Analysis (LSA) in conjunction with a multilingual parallel aligned corpus. This approach has been shown to be successful in identifying similar documents across languages -or more precisely, retrieving the most similar document in one language to a query in another language. However, the approach has severe drawbacks when applied to a related task, that of clustering documents 'language-independently', so that documents abo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
47
0

Year Published

2008
2008
2023
2023

Publication Types

Select...
5
3
2

Relationship

0
10

Authors

Journals

citations
Cited by 76 publications
(47 citation statements)
references
References 12 publications
0
47
0
Order By: Relevance
“…In the case of a parallel corpus (i.e., a document collection where each document has an exact translation in the other language), each document is concatenated with its counterpart document in the other language to form an interlingua term by document matrix, from which interlingual topic components can form the lower dimensional representation of the documents. Such representations were used in a cross-language retrieval setting [4,15,6,18,7] and document clustering [17]. A method based on LSA, but only using a short set of manually gathered comparable documents was presented in [23].…”
Section: Related Researchmentioning
confidence: 99%
“…In the case of a parallel corpus (i.e., a document collection where each document has an exact translation in the other language), each document is concatenated with its counterpart document in the other language to form an interlingua term by document matrix, from which interlingual topic components can form the lower dimensional representation of the documents. Such representations were used in a cross-language retrieval setting [4,15,6,18,7] and document clustering [17]. A method based on LSA, but only using a short set of manually gathered comparable documents was presented in [23].…”
Section: Related Researchmentioning
confidence: 99%
“…In [22], the authors propose a sampling-based Tucker3 decomposition in order to perform content based network analysis and visualization. The list continues, including applications such as [10] [16] [3]. Apart from Data Mining, tensors have been and are still being applied in a multitude of fields such as Chemometrics [9] and Signal Processing [21].…”
Section: Related Workmentioning
confidence: 99%
“…Even the totality of English speakers is only 15%. 4 Suffice it to say that ignoring 85% of the world's population is not a viable stratagem for success in archiving all digital data. Many of the items are not natively expressed in English.…”
Section: Future Directionsmentioning
confidence: 99%