2014 International Conference on Computational Intelligence and Communication Networks 2014
DOI: 10.1109/cicn.2014.121
|View full text |Cite
|
Sign up to set email alerts
|

Towards Reliable Clustering of English Text Documents Using Correlation Coefficient

Abstract: This paper proposes a new approach for clustering English text documents, based on finding the pair wise correlation of documents in a given set of text documents. The correlation coefficient for each pair of documents is calculated on the basis of ranks given to the words in the documents. The ranking of the words occurring in a document is computed on the basis of weights of the words calculated according to the conventional TF-IDF factor. The proposed method is found to be able to cluster a given set of tex… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
3
0
1

Year Published

2017
2017
2023
2023

Publication Types

Select...
1
1
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(4 citation statements)
references
References 29 publications
(23 reference statements)
0
3
0
1
Order By: Relevance
“…The algorithm is based on the search for the word stems (the part of a word that represents its unchangeable part, expressing its lexical meaning) that meet in the text (Bhaumik et al, 2014; Birjali et al, 2016). To implement the process of searching for a word stem in a given source word (word type), Porter’s stemming is used (Popovič & Willett, 1992).…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…The algorithm is based on the search for the word stems (the part of a word that represents its unchangeable part, expressing its lexical meaning) that meet in the text (Bhaumik et al, 2014; Birjali et al, 2016). To implement the process of searching for a word stem in a given source word (word type), Porter’s stemming is used (Popovič & Willett, 1992).…”
Section: Resultsmentioning
confidence: 99%
“…El algoritmo se basa en la búsqueda de las raíces de las palabras (la parte de una palabra que representa su porción inalterable que expresa su significado léxico) que se encuentran en el texto (Bhaumik et al, 2014; Birjali et al, 2016). Para ejecutar el proceso de búsqueda de una raíz en una palabra fuente determinada (tipo de palabra), se hace uso de la técnica stemming de Porter (Popovič & Willett, 1992).…”
Section: Resultsunclassified
“…An analysis of the various document clustering methods by showing the feature selection methods, similarity measures and evaluation measures of document clustering is done [17]. The use of clustering documents for browsing large document collections is presented in [18], document clustering for fetching relevant English documents in [19]. Clustering is also used for sentiment analysis in predicting the mood as positive or negative [20].…”
Section: User Profile Based Single Source Clusteringmentioning
confidence: 99%
“…Clustering enables the searching of documents efficiently, and a technique for clustering text documents for browsing large document collections is done [18]. The TF-IDF and the clustering approach together for clustering English text documents that are more relevant are performed in [19]. The clustering of documents using TF-IDF scores at word levels for classifying the sentiment of the document as positive or negative is done [20].…”
Section: Introductionmentioning
confidence: 99%