Techniques, Applications, and Issues in Mining Large-Scale Text Databases

Avasthi, Sandhya; Chauhan, Ritu; Acharjya, D. P.

doi:10.1007/978-981-15-5421-6_39

Cited by 22 publications

(6 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…• Text filtering: This step consists of removing undesired data from the collected datasets, such as duplicate and corrupted information, hyperlinks, and foreign language text, if required. While the removal of duplicate or corrupted data and hyperlinks in text data can be trivial, language detection is a more complex task to perform at scale [153]. To aid text filtering applications and reduce the requirement of manual language labeling, language filtering of text data can be performed using automated tools such as Google's Compact Language Detector [154], langid.py [155] or similar open-source software.…”

Section: B Aspect Extraction Techniquesmentioning

confidence: 99%

Sentiment Analysis of Public Social Media as a Tool for Health-Related Topics

et al. 2022

View full text Add to dashboard Cite

Section: B Aspect Extraction Techniquesmentioning

confidence: 99%

Sentiment Analysis of Public Social Media as a Tool for Health-Related Topics

et al. 2022

View full text Add to dashboard Cite

“…These social skills or social intelligence also enhances commitment and learning in an individual specially in job sectors (Torabi, 2021;Mohadesi, 2021). It helps in extraction of large-scale text data and social computing as the process of extracting data from large text corpus is difficult (Avasthi et al, 2021;Wang et al, 2007).…”

Section: Literature Reviewmentioning

confidence: 99%

The Relevance of Social Intelligence for Effective Optimization of Retirement and Successful Ageing

Sanwal

Sareen

2021

Ageing Int

View full text Add to dashboard Cite

“…Multilingual text is an other open challenges. The proposed work focuses on all these challenges (Avasthi et al, 2020)…”

Section: Literature Reviewmentioning

confidence: 99%

Predicting Marathi News Class Using Semantic Entity-Driven Clustering Approach

Saini

Bafna

2021

Journal of Cases on Information Technology

View full text Add to dashboard Cite

Document management is a need for an era and managing documents in the regional languages is a significant and untouched area. Marathi corpus consisting of news is processed to form Group Entity document matrix Marathi (GEDMM), Vector space model for Marathi (VSMM) and Hysynset Vector space model for Marathi (HSVSMM). GEDMM uses entity group extracted using Condition random field (CRF). The frequent terms are used to construct VSMM using TF-IDF. HSVSMM uses synsets using hypernyms-hyponyms and synonyms. GEDMM and HSVSMM use dimension reduction by selecting significant feature groups. Hierarchical agglomerative clustering (HAC) is used and a dendrogram is produced to visualize the clusters. The performance analysis is carried out using several parameters like entropy, purity, misclassification error and accuracy. The clusters produced using GEDMM shows the minimum entropy and the highest purity. A random forest classifier is applied and the results are evaluated using misclassification error and accuracy.

show abstract

Techniques, Applications, and Issues in Mining Large-Scale Text Databases

Cited by 22 publications

References 21 publications

Sentiment Analysis of Public Social Media as a Tool for Health-Related Topics

Sentiment Analysis of Public Social Media as a Tool for Health-Related Topics

The Relevance of Social Intelligence for Effective Optimization of Retirement and Successful Ageing

Predicting Marathi News Class Using Semantic Entity-Driven Clustering Approach

Contact Info

Product

Resources

About