Feature selection, optimization and clustering strategies of text documents

Nikhath, A. Kousar; Subrahmanyam, K.

doi:10.11591/ijece.v9i2.pp1313-1320

Cited by 13 publications

(7 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In this study, we used the result of the calculation of cosine similarity [17] to represent the degree of message similarity. Cosine similarity [18][19] is the traditional method used to measure the degree of similarity between two vectors, obtained from the cosine angle multiplication. The Cosine similarity [8] can be calculated using term frequency and inverse document frequency (TF-IDF) formulas.…”

Section: The Similarity Of Chat Messagesmentioning

confidence: 99%

Analysis of spammers’ behavior on a live streaming chat

Yousukkee

Wisitpongphan

2021

IJ-AI

View full text Add to dashboard Cite

<span id="docs-internal-guid-f908fd2e-7fff-1849-4fda-c2cf9baed97e"><span>Live streaming is becoming a popular channel for advertising and marketing. An advertising company can use this feature to broadcast and reach a large number of customers. YouTube is one of the streaming media with an extreme growth rate and a large number of viewers. Thus, it has become a primary target of spammers and attackers. Understanding the behavior of users on live chat may reduce the moderator’s time in identifying and preventing spammers from disturbing other users. In this paper, we analyzed YouTube live streaming comments in order to understand spammers’ behavior. Seven user’s behavior features and message characteristic features were comprehensively analyzed. According to our findings, features that performed best in terms of run time and classification efficiency is the relevant score together with the time spent in live chat and the number of messages per user. The accuracy is as high as 66.22 percent. In addition, the most suitable technique for real-time classification is a decision tree.</span></span>

show abstract

Section: The Similarity Of Chat Messagesmentioning

confidence: 99%

Analysis of spammers’ behavior on a live streaming chat

Yousukkee

Wisitpongphan

2021

IJ-AI

View full text Add to dashboard Cite

show abstract

“…Feature selection approach try to find a subset of the originalvariables (also called attributes or features). In this process three different strategies can be used one is filter for information gain, wrapper is used for accuracy and embedded is used to add or remove while constructing the model based on the predicted errors [11] . In some data analysis cases such as classification or regression can be done in the reduced space more exactly than the original data space.…”

Section: Feature Selectionmentioning

confidence: 99%

“…To measure the distance between two points Euclidean distance metric is took the major role, at the same time easily measure the data by using ruler for two and three dimensional spaces also. Sometimes Euclidean will also be selected in clustering [11] .…”

Section: Euclidean Distancementioning

confidence: 99%

Cluster Optimization for Boundary Points using Distributive Progressive Feature Selection Algorithm

Ramesh¹

2021

TURCOMAT

View full text Add to dashboard Cite

A group of different data objects is classified as similar objects is known as clusters. It is the process of finding homogeneous data items like patterns, documents etc. and then group the homogenous data items togetherothers groupsmay have dissimilar data items. Most of the clustering methods are either crisp or fuzzy and moreover member allocation to the respective clusters is strictly based on similarity measures and membership functions.Both of the methods have limitations in terms of membership. One strictly decides a sample must belong to single cluster and other anyway fuzzy i.e probability. Finally, Quality and Purity like measure are applied to understand how well clusters are created. But there is a grey area in between i.e. ‘Boundary Points’ and ‘Moderately Far’ points from the cluster centre. We considered the cluster quality [18], processing time and relevant features identification as basis for our problem statement and implemented Zone based clustering by using map reducer concept. I have implemented the process to find far points from different clusters and generate a new cluster, repeat the above process until cluster quantity is stabilized. By using this processwe can improve the cluster quality and processing time also.

show abstract

“…Objects are similar inside the same cluster whereas dissimilar compared to objects descending from other clusters. Clustering, as a class of unsupervised classification method, has been widely applied in different domains, machine learning, image segmentation, pattern recognition, text mining and many other domains [1][2][3]. Great number of clustering algorithms lie in literature, the famous K-mean clustering [4], hierarchical clustering [5], k-medoids [6], and mean shift [7] have been considered in various problems.…”

Section: Introductionmentioning

confidence: 99%

Clustering using kernel entropy principal component analysis and variable kernel estimator

Fattahi

Sbai

2021

IJECE

View full text Add to dashboard Cite

Clustering as unsupervised learning method is the mission of dividing data objects into clusters with common characteristics. In the present paper, we introduce an enhanced technique of the existing EPCA data transformation method. Incorporating the kernel function into the EPCA, the input space can be mapped implicitly into a high-dimensional of feature space. Then, the Shannon’s entropy estimated via the inertia provided by the contribution of every mapped object in data is the key measure to determine the optimal extracted features space. Our proposed method performs very well the clustering algorithm of the fast search of clusters’ centers based on the local densities’ computing. Experimental results disclose that the approach is feasible and efficient on the performance query.

show abstract

Feature selection, optimization and clustering strategies of text documents

Cited by 13 publications

References 13 publications

Analysis of spammers’ behavior on a live streaming chat

Analysis of spammers’ behavior on a live streaming chat

Cluster Optimization for Boundary Points using Distributive Progressive Feature Selection Algorithm

Clustering using kernel entropy principal component analysis and variable kernel estimator

Contact Info

Product

Resources

About