2015
DOI: 10.5539/mas.v9n5p246
|View full text |Cite
|
Sign up to set email alerts
|

Establishing Semantic Similarity of the Cluster Documents and Extracting Key Entities in the Problem of the Semantic Analysis of News Texts

Abstract: This paper is dedicated to the problem of establishing semantic similarity for the documents of the news cluster and extracting key entities from the article's text. The existing methods and algorithms for fuzzy duplicate detection texts are briefly reviewed and analysed, such as TF-IDF and its modifications, Long Sent, Megashingles and Log Shingles, and Lex Rand. The shingles algorithm essence and its main stages are described in detail. Several options of the parallel implementation for the shingles algorith… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Year Published

2015
2015
2022
2022

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
references
References 11 publications
0
0
0
Order By: Relevance