Towards Reliable Clustering of English Text Documents Using Correlation Coefficient

Bhaumik, Hrishikesh; Mukherjee, Anirban; Bhattacharyya, Siddhartha; Chattopadhyay, Manojit

doi:10.1109/cicn.2014.121

Cited by 3 publications

(4 citation statements)

References 29 publications

(23 reference statements)

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…The algorithm is based on the search for the word stems (the part of a word that represents its unchangeable part, expressing its lexical meaning) that meet in the text (Bhaumik et al, 2014; Birjali et al, 2016). To implement the process of searching for a word stem in a given source word (word type), Porter’s stemming is used (Popovič & Willett, 1992).…”

Section: Resultsmentioning

confidence: 99%

“…El algoritmo se basa en la búsqueda de las raíces de las palabras (la parte de una palabra que representa su porción inalterable que expresa su significado léxico) que se encuentran en el texto (Bhaumik et al, 2014; Birjali et al, 2016). Para ejecutar el proceso de búsqueda de una raíz en una palabra fuente determinada (tipo de palabra), se hace uso de la técnica stemming de Porter (Popovič & Willett, 1992).…”

Section: Resultsunclassified

See 1 more Smart Citation

Support for decision-making in checking the level of quality of student research works based on automated text analysis ( Asistencia para la toma de decisiones en la evaluación de la calidad de las investigaciones de los estudiantes basada en el análisis automático de textos )

Tarkhova,

Tarkhov,

Akhmetyanov

et al. 2023

Culture and Education

View full text Add to dashboard Cite

The purpose of this study is to increase the efficiency of the analysis process and the objectivity of decision-making on the final assessment, showing the quality of student research works based on the use of an automated text analysis software product. The total number of evaluated research papers was more than 300 (average age of students: 21.6; SD = 34). During the experiment, the effectiveness of the software product Multifunctional Text Analyser (MTA) was tested on the specified sample. The text of the analysed research works can be assessed as qualitative if the complex indicator of the text fragments congruence of the work exceeds 70%. The current article can be used as a methodological and theoretical basis for implementing the principles of academic virtue in institutions of higher education.

show abstract

Section: Resultsmentioning

confidence: 99%

Section: Resultsunclassified

Support for decision-making in checking the level of quality of student research works based on automated text analysis ( Asistencia para la toma de decisiones en la evaluación de la calidad de las investigaciones de los estudiantes basada en el análisis automático de textos )

Tarkhova,

Tarkhov,

Akhmetyanov

et al. 2023

Culture and Education

View full text Add to dashboard Cite

show abstract

“…An analysis of the various document clustering methods by showing the feature selection methods, similarity measures and evaluation measures of document clustering is done [17]. The use of clustering documents for browsing large document collections is presented in [18], document clustering for fetching relevant English documents in [19]. Clustering is also used for sentiment analysis in predicting the mood as positive or negative [20].…”

Section: User Profile Based Single Source Clusteringmentioning

confidence: 99%

“…Clustering enables the searching of documents efficiently, and a technique for clustering text documents for browsing large document collections is done [18]. The TF-IDF and the clustering approach together for clustering English text documents that are more relevant are performed in [19]. The clustering of documents using TF-IDF scores at word levels for classifying the sentiment of the document as positive or negative is done [20].…”

Section: Introductionmentioning

confidence: 99%

Twigraph: Discovering and Visualizing Influential Words Between Twitter Profiles

Sundararaman

Srinivasan

2017

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Abstract. The social media craze is on an ever increasing spree, and people are connected with each other like never before, but these vast connections are visually unexplored. We propose a methodology Twigraph to explore the connections between persons using their Twitter profiles. First, we propose a hybrid approach of recommending social media profiles, articles, and advertisements to a user. The profiles are recommended based on the similarity score between the user profile, and profile under evaluation. The similarity between a set of profiles is investigated by finding the top influential words thus causing a high similarity through an Influence Term Metric for each word. Then, we group profiles of various domains such as politics, sports, and entertainment based on the similarity score through a novel clustering algorithm. The connectivity between profiles is envisaged using word graphs that help in finding the words that connect a set of profiles and the profiles that are connected to a word. Finally, we analyze the top influential words over a set of profiles through clustering by finding the similarity of that profiles enabling to break down a Twitter profile with a lot of followers to fine level word connections using word graphs. The proposed method was implemented on datasets comprising 1.1 M Tweets obtained from Twitter. Experimental results show that the resultant influential words were highly representative of the relationship between two profiles or a set of profiles.

show abstract

Topic Analysis of Climate-Change News

Chawathe

2020

2020 10th Annual Computing and Communication Workshop and Conference (CCWC)

View full text Add to dashboard Cite

Towards Reliable Clustering of English Text Documents Using Correlation Coefficient

Cited by 3 publications

References 29 publications

Support for decision-making in checking the level of quality of student research works based on automated text analysis ( Asistencia para la toma de decisiones en la evaluación de la calidad de las investigaciones de los estudiantes basada en el análisis automático de textos )

Support for decision-making in checking the level of quality of student research works based on automated text analysis ( Asistencia para la toma de decisiones en la evaluación de la calidad de las investigaciones de los estudiantes basada en el análisis automático de textos )

Twigraph: Discovering and Visualizing Influential Words Between Twitter Profiles

Topic Analysis of Climate-Change News

Contact Info

Product

Resources

About