Web Text Categorization Based on Statistical Merging Algorithm in Big Data Environment

Wang, Rujuan; Wang, Gang

doi:10.4018/ijaci.2019070102

Cited by 20 publications

(10 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The proposed method achieved an accuracy of 92.40% but is computationally expensive. Accordingly, due to the effectiveness of the proposed approach, it is suggested to compare the proposed approach with other feature selection methods for benchmarking and other previous studies on dermoscopic images, such as in [28][29][30][31][32]. In the future, we intend to migrate this method to a mobile application.…”

Section: Resultsmentioning

confidence: 99%

Feature Selection of Non-Dermoscopic Skin Lesion Images for Nevus and Melanoma Classification

et al. 2020

View full text Add to dashboard Cite

(1) Background: In this research, we aimed to identify and validate a set of relevant features to distinguish between benign nevi and melanoma lesions. (2) Methods: Two datasets with 70 melanomas and 100 nevi were investigated. The first one contained raw images. The second dataset contained images preprocessed for noise removal and uneven illumination reduction. Further, the images belonging to both datasets were segmented, followed by extracting features considered in terms of form/shape and color such as asymmetry, eccentricity, circularity, asymmetry of color distribution, quadrant asymmetry, fast Fourier transform (FFT) normalization amplitude, and 6th and 7th Hu’s moments. The FFT normalization amplitude is an atypical feature that is computed as a Fourier transform descriptor and focuses on geometric signatures of skin lesions using the frequency domain information. The receiver operating characteristic (ROC) curve and area under the curve (AUC) were employed to ascertain the relevance of the selected features and their capability to differentiate between nevi and melanoma. (3) Results: The ROC curves and AUC were employed for all experiments and selected features. A comparison in terms of the accuracy and AUC was performed, and an evaluation of the performance of the analyzed features was carried out. (4) Conclusions: The asymmetry index and eccentricity, together with F6 Hu’s invariant moment, were fairly competent in providing a good separation between malignant melanoma and benign lesions. Also, the FFT normalization amplitude feature should be exploited due to showing potential in classification.

show abstract

Section: Resultsmentioning

confidence: 99%

Feature Selection of Non-Dermoscopic Skin Lesion Images for Nevus and Melanoma Classification

et al. 2020

View full text Add to dashboard Cite

show abstract

“…The supervised approach ( Turney, 2002 ) transforms the keyphrase extraction work into a classification or regression problem ( Wang & Wang, 2019 ). It employs the learned model to identify if a candidate phrase in a text is a keyphrase by training it on the labeled training set.…”

Section: Methodsmentioning

confidence: 99%

Evaluating keyphrase extraction algorithms for finding similar news articles using lexical similarity calculation and semantic relatedness measurement by word embedding

Sarwar¹,

Noor²,

Miah³

2022

PeerJ Computer Science

View full text Add to dashboard Cite

A textual data processing task that involves the automatic extraction of relevant and salient keyphrases from a document that expresses all the important concepts of the document is called keyphrase extraction. Due to technological advancements, the amount of textual information on the Internet is rapidly increasing as a lot of textual information is processed online in various domains such as offices, news portals, or for research purposes. Given the exponential increase of news articles on the Internet, manually searching for similar news articles by reading the entire news content that matches the user’s interests has become a time-consuming and tedious task. Therefore, automatically finding similar news articles can be a significant task in text processing. In this context, keyphrase extraction algorithms can extract information from news articles. However, selecting the most appropriate algorithm is also a problem. Therefore, this study analyzes various supervised and unsupervised keyphrase extraction algorithms, namely KEA, KP-Miner, YAKE, MultipartiteRank, TopicRank, and TeKET, which are used to extract keyphrases from news articles. The extracted keyphrases are used to compute lexical and semantic similarity to find similar news articles. The lexical similarity is calculated using the Cosine and Jaccard similarity techniques. In addition, semantic similarity is calculated using a word embedding technique called Word2Vec in combination with the Cosine similarity measure. The experimental results show that the KP-Miner keyphrase extraction algorithm, together with the Cosine similarity calculation using Word2Vec (Cosine-Word2Vec), outperforms the other combinations of keyphrase extraction algorithms and similarity calculation techniques to find similar news articles. The similar articles identified using KPMiner and the Cosine similarity measure with Word2Vec appear to be relevant to a particular news article and thus show satisfactory performance with a Normalized Discounted Cumulative Gain (NDCG) value of 0.97. This study proposes a method for finding similar news articles that can be used in conjunction with other methods already in use.

show abstract

“…Authors in [30] found numerous aspects in the data collected by victims, such as negative feelings, isolation, and repeated pattern of fear terms. The authors in [31] used behavioral trends on Facebook to predict depression following violence incidence by self-reported victims through their Facebook status. They performed t-tests to distinguish the victims' behaviors in first violence incidence and in the repeated ones.…”

Section: Predicting Violence-induced Stress Incidents Through Questio...mentioning

confidence: 99%

“…These data were used in the learning random forest classifiers to identify violence victims from non-victims. Authors in [29][30][31][32] collected data from Twitter to predict violence incidence from tweeter. Authors in [33], employed analysis dynamics in physician rating websites during the early wave of the COVID-19 pandemic.…”

Section: Predicting Violence-induced Stress Incidents Through Questio...mentioning

confidence: 99%

Predicting Violence-Induced Stress in an Arabic Social Media Forum

AlArfaj¹,

Hakami²,

Mahmoud³

2023

Intelligent Automation &Amp; Soft Computing

View full text Add to dashboard Cite

Web Text Categorization Based on Statistical Merging Algorithm in Big Data Environment

Cited by 20 publications

References 14 publications

Feature Selection of Non-Dermoscopic Skin Lesion Images for Nevus and Melanoma Classification

Feature Selection of Non-Dermoscopic Skin Lesion Images for Nevus and Melanoma Classification

Evaluating keyphrase extraction algorithms for finding similar news articles using lexical similarity calculation and semantic relatedness measurement by word embedding

Predicting Violence-Induced Stress in an Arabic Social Media Forum

Contact Info

Product

Resources

About