Text Document Clustering Using Dimension Reduction Technique

Ramkumar, A.; Poorna, B.

doi:10.37622/ijaer/11.7.2016.4770-4774

Cited by 9 publications

(4 citation statements)

References 2 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The contrast of K-means clustering with Euclidean and Manhattan distance [18] and K Medoids clustering [19] proposed by using WEKA and java programming. The evaluation results illustrate that the K Medoids perform better than the K-means Clustering [49].…”

Section: Related Workmentioning

confidence: 98%

Feature Discrimination of News Based on Canopy and KMGC-Search Clustering

et al. 2022

View full text Add to dashboard Cite

The internet provides a very vast amount of sources of news and the user has to search for desirable news by spending a lot of time because the user always prefers their related interest, desirable and informative news. The clustering of the news article having a great impact on the preferences of the user. The unsupervised learning techniques such that K-means Clustering and Spectral Clustering are proposed to categorize the news articles by extracting discriminant features that help the user to search and get informative news without wasting time. The BBC news articles dataset is used to perform experiments that consist of 2225 news articles. The TF-IDF feature extraction technique is used with Kmeans clustering and Spectral clustering to get the most similar clusters to categorize the news articles in respective domains. Those domains are sports, tech, entertainment, politics, and business. The clustering algorithms are evaluated using adjusted rand index, V-measure, homogeneity score, completeness score, and Fowlkes mallows score. The experimental results illustrated that K-means clustering performs better than spectral clustering using the TF-IDF feature extraction approach. But to improve the results the canopy centroid selection is used with the grid search optimization technique to optimize the results of the Kmeans and named its as a K-Means using Grid Search based on Canopy (KMGC-Search). The experimental results shows the proposed approach can be used as a viable method for the categorization of news articles.

show abstract

Section: Related Workmentioning

confidence: 98%

Feature Discrimination of News Based on Canopy and KMGC-Search Clustering

et al. 2022

View full text Add to dashboard Cite

show abstract

“…In addition, the Naive Bayes technique has been utilized to develop an automated method for the summarization of multiple documents (Ramanujam & Kaliappan, 2016). In contrast to Nave Bayes, the authors (Aliguliyev, 2009; Sivakumar & Soumya, 2015) also used clustering methods in order to generate extractive summaries. Sivakumar and Soumya (2015) generated the coordinates by employing an n‐dimensional substructure, and they grouped the articles into categories according to the degree to which they were semantically similar to one another.…”

Section: Literature Surveymentioning

confidence: 99%

“…In contrast to Nave Bayes, the authors (Aliguliyev, 2009; Sivakumar & Soumya, 2015) also used clustering methods in order to generate extractive summaries. Sivakumar and Soumya (2015) generated the coordinates by employing an n‐dimensional substructure, and they grouped the articles into categories according to the degree to which they were semantically similar to one another. Aliguliyev and colleagues (Aliguliyev, 2009) presented a method for the clustering of phrases.…”

Section: Literature Surveymentioning

confidence: 99%

Extractive text summarization for biomedical transcripts using deep dense LSTM‐CNN framework

Bedi,

Bala,

Sharma

2023

Expert Systems

View full text Add to dashboard Cite

The most recent and precise biological and healthcare knowledge is critical in the current outbreak such as COVID. In today's small world, everyone needs timely and appropriate medical information to prevent contagious diseases. Extraction of important information from medical conversations and dissemination to patients and doctors may benefit in the treatments of doctor tiredness and patient amnesia.ProblemAutomatic text summarizing is essential for gaining great knowledge in any topic in an efficient and productive manner. The material included in health records is vital to our understanding of kind illness and its manifestations. Creating a comprehensive and standard kind of content is becoming an unavoidable and crucial problem in the medical process as a result of the massive amounts of fragmented data created in many sectors.ApproachThe purpose of this study is to employ NLP‐based deep learning algorithms for text summary that perform well on linguistic text summarization data, and then modify/adapt these for biomedical domain‐specific text summarization. This paper provides an approach developed in‐house for condensing ill‐punctuated or unpunctuated discussion transcripts into more intelligible summaries, which combines topic modelling and phrase selection with punctuation restorations. For autonomous synthesis of medical reports from biomedical transcripts, this research proposes using an end‐to‐end summarization technique, Deep Dense Long Short Term Memory Network (LSTM), followed by Convolutional Neural Network (CNN).ResultsExtensive testing, examination, and comparing have demonstrated that this summarizer works well for medical transcript summarization. The suggested approach achieved an average ROUGE score of 93.5% using a single document summary. Furthermore, by comparing new techniques to previous ones, the utility and accuracy of novel strategies would be shown. The results reveal that models trained on ordinary language provide comparable results on a biomedical testing set, with one model outperforming the linguistic test set.

show abstract

“…A. SudhaRamkumar et al, [23] focused on the work of text document clustering. Within the process of implementation, they selected the processing area for a large amount of data by eliminating irrelevant data using dimensionality reduction.…”

Section: Literature Surveymentioning

confidence: 99%

Clustering and Indexing of Multiple Documents Using Feature Extraction Through Apache Hadoop on Big Data

Lydia¹,

Gummadi²,

Varadarajan

et al. 2020

MJCS

View full text Add to dashboard Cite

Bigdata is a challenging field in data processing since the information is retrieved from various search engines through internet. A number of large organizations, that use document clustering,fails in arranging the documents sequentially in their machines. Across the globe, advanced technologyhas contributed to the high speed internet access. But the consequences of useful yet unorganized information in machine files seemto be confused in the retrieval process. Manual ordering of files has its own complications. In this paper, application software like Apache Lucene and Hadoop have taken a lead towards text mining for indexing and parallel implementation of document clustering. In organizations, it identifies the structure of the text data in computer files and its arrangement from files to folders, folders to subfolders, and to higher folders. A deeper analysis of document clustering was performed by considering various efficient algorithms like LSI, SVD and was compared with the newly proposed updated model of Non-Negative Matrix Factorization. The parallel implementation of hadoopdevelopedautomatic clusters for similar documents. MapReduce framework enforced its approach using K-means algorithm for all the incoming documents. The final clusters were automatically organized in folders using Apache Lucene in machines. This model was tested by considering the dataset of Newsgroup20 text documents. Thus this paper determines the implementation of large scale documents using parallel performance of MapReduce and Lucenethat generate automatic arrangement of documents, which reduces the computational time and improves the quick retrieval of documents in any scenario.

show abstract

Text Document Clustering Using Dimension Reduction Technique

Cited by 9 publications

References 2 publications

Feature Discrimination of News Based on Canopy and KMGC-Search Clustering

Feature Discrimination of News Based on Canopy and KMGC-Search Clustering

Extractive text summarization for biomedical transcripts using deep dense LSTM‐CNN framework

Clustering and Indexing of Multiple Documents Using Feature Extraction Through Apache Hadoop on Big Data

Contact Info

Product

Resources

About