2018
DOI: 10.1017/s1351324918000232
|View full text |Cite
|
Sign up to set email alerts
|

Alignment of comparable documents: Comparison of similarity measures on French–English–Arabic data

Abstract: The objective, in this article, is to address the issue of the comparability of documents, which are extracted from different sources and written in different languages. These documents are not necessarily translations of each other. This material is referred as multilingual comparable corpora. These language resources are useful for multilingual natural language processing applications, especially for low-resourced language pairs. In this paper, we collect different data in Arabic, English, and French. Two co… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
3
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 9 publications
(5 citation statements)
references
References 21 publications
0
3
0
Order By: Relevance
“…Fig. 4 illustrates the data from Table V to VIII which contains many-to-many mappings of [1,6], [3,10], [5,7] and [4,6] French Documents and different English documents. It shows that the accuracy of Fuzzy-Wuzzy (Partial Ratio) technique is more than the accuracy of remaining presented techniques.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…Fig. 4 illustrates the data from Table V to VIII which contains many-to-many mappings of [1,6], [3,10], [5,7] and [4,6] French Documents and different English documents. It shows that the accuracy of Fuzzy-Wuzzy (Partial Ratio) technique is more than the accuracy of remaining presented techniques.…”
Section: Resultsmentioning
confidence: 99%
“…The basis for this estimation could be bilingual dictionaries or digital techniques, for example, latent semantic indexing (LSI) [16]. To find similar Arabic/English documents in two ways, LSI is used [1]. Monolingual: the first way is to translate the English article into Arab and then map it into space in the Arabic language LSI [10].…”
Section: Related Workmentioning
confidence: 99%
“…And, the privacy and security of these data sources should be ensured. The relevant research on provable construction methods of privacy-preserving comparable corpora have been carried out at home and abroad [8], with the following three main methods: that based on word frequency distribution [28], that based on feature distribution, and that based on cross-lingual retrieval; e.g., D. Langlos et al [29] used a cross-page semantic feature approach to obtain Arabic, English and French data and constructed a trilingual corpus, but the capacity of the corpus was comparably small, with only 305 comparable corpus pairs, and they ignored the privacy-preserving issue. The web-based construction method is a basic resource of comparable corpora, which mainly includes news websites; e.g., Yuan Wei [30] built a Chinese-Russian comparable news corpus by acquiring news corpora through the Xinhua website.…”
Section: Methods Of Privacy-preserving Multilingual Comparable Corpus...mentioning
confidence: 99%
“…Many previous studies have found the close relationship between IoT data sharing and data privacy protection. Currently, the focus on data privacy protection is mainly in specific areas, such as the data analysis of patient conditions in the medical field [37], industry information [4] and data protection in the railway transportation sector [29]. There is relatively less coverage of privacy protection concerning multilingual data based on IoT news.…”
Section: Applications Of Privacy-preserving Multilingual Comparable C...mentioning
confidence: 99%
“…In other words, we have to align two texts one is in Arabic and the second is in English. Several works, on multilingual comparability, have been proposed by the international community [10], [22], [9], [3], [13]. Overall, they concern documents harvested from social networks, Wikipedia, etc.…”
Section: Video Database Of Amismentioning
confidence: 99%