2016
DOI: 10.11591/ijece.v6i5.pp2454-2461
|View full text |Cite
|
Sign up to set email alerts
|

An Approach of Semantic Similarity Measure between Documents Based on Big Data

Abstract: <p>Semantic indexing and document similarity is an important information retrieval system problem in Big Data with broad applications. In this paper, we investigate MapReduce programming model as a specific framework for managing distributed processing in a large of amount documents. Then we study the state of the art of different approaches for computing the similarity of documents. Finally, we propose our approach of semantic similarity measures using WordNet as an external network semantic resource. F… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 9 publications
(4 citation statements)
references
References 6 publications
(6 reference statements)
0
4
0
Order By: Relevance
“…In Figure 2 to avoid similarity attack, similarity index of each group is calculated [19]. If some values are similar then such values will be replaced with other values.…”
Section: Research Methods 31 the Need And Importance Of The Problemmentioning
confidence: 99%
“…In Figure 2 to avoid similarity attack, similarity index of each group is calculated [19]. If some values are similar then such values will be replaced with other values.…”
Section: Research Methods 31 the Need And Importance Of The Problemmentioning
confidence: 99%
“…To address the problem of information retrieval in big data environments, the authors of [7] proposed a semantic similarity measure using WordNet and a MapReduce algorithm. They index the query and compare it to the index of each document.…”
Section: A Text Similaritymentioning
confidence: 99%
“…Text mining in big data analytics is emerging as a powerful tool for harnessing the power of unstructured textual data by analyzing it to extract new knowledge and to identify significant patterns and correlations hidden in the data [1] [5]. Furthermore, quickly detecting similar documents becomes a fundamental problem as times go on [6]. This difficulty is closely related to the semantic aspect of these documents.…”
Section: Introductionmentioning
confidence: 99%