2016
DOI: 10.1613/jair.4780
|View full text |Cite
|
Sign up to set email alerts
|

News Across Languages - Cross-Lingual Document Similarity and Event Tracking

Abstract: In today's world, we follow news which is distributed globally. Significant events are reported by different sources and in different languages. In this work, we address the problem of tracking of events in a large multilingual stream. Within a recently developed system Event Registry we examine two aspects of this problem: how to compare articles in different languages and how to link collections of articles in different languages which refer to the same event. Taking a multilingual stream and clusters of art… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
25
0

Year Published

2017
2017
2022
2022

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 33 publications
(29 citation statements)
references
References 28 publications
0
25
0
Order By: Relevance
“…The conclusions from our experiments are: (a) the weighting of the similarity metric features using SVM significantly outperforms unsupervised baselines such as CluStream ( Rupnik et al (2016), as explained in §5. "Size" denotes the number of documents in the collection, "Avg.…”
Section: Methodsmentioning
confidence: 85%
See 2 more Smart Citations
“…The conclusions from our experiments are: (a) the weighting of the similarity metric features using SVM significantly outperforms unsupervised baselines such as CluStream ( Rupnik et al (2016), as explained in §5. "Size" denotes the number of documents in the collection, "Avg.…”
Section: Methodsmentioning
confidence: 85%
“…However, these are not "truly" online crosslingual clustering systems since they only decide on the linking of already-built monolingual clusters. In particular, Rupnik et al (2016) compute distances of document pairs across clusters using nearest neighbors, which might not scale well in an online setting. As detailed before, we adapted the cluster-linking dataset from Rupnik et al (2016) to evaluate our online crosslingual clustering approach.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Another approach is based on the family of matrix decomposition [8], [24], [30]. This type of approach requires a parallel translation corpus and constructs a term-document matrix for each language.…”
Section: Related Workmentioning
confidence: 99%
“…The experimental results demonstrate that taking into account the semantic aspect of news increases performance and improves linking. In the news domain there is another work that compares different cross-lingual document similarity measures based on Wikipedia to establish link connections of articles in different languages [46]. Also this work is mainly based on the identification of named entities in the articles by annotating the entity with the corresponding Wikipedia page.…”
Section: Cross-lingual Data Linkingmentioning
confidence: 99%