2018
DOI: 10.2478/amcs-2018-0060
|View full text |Cite
|
Sign up to set email alerts
|

A Case Study in Text Mining of Discussion Forum Posts: Classification with Bag of Words and Global Vectors

Abstract: Despite the rapid growth of other types of social media, Internet discussion forums remain a highly popular communication channel and a useful source of text data for analyzing user interests and sentiments. Being suited to richer, deeper, and longer discussions than microblogging services, they particularly well reflect topics of long-term, persisting involvement and areas of specialized knowledge or experience. Discovering and characterizing such topics and areas by text mining algorithms is therefore an int… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
22
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 18 publications
(22 citation statements)
references
References 42 publications
0
22
0
Order By: Relevance
“…RF is a popular ensemble modelling algorithm that achieves excellent predictive performance by combining multiple models from the same domain [26], An RF is represented by a set of unpruned DTs that are grown based on multiple bootstrap samples that are drawn (with replacements) from the training set via randomised split selection. RF is a rapid and accurate technique employed for document categorisation and text classification.…”
Section: Random Forestmentioning
confidence: 99%
“…RF is a popular ensemble modelling algorithm that achieves excellent predictive performance by combining multiple models from the same domain [26], An RF is represented by a set of unpruned DTs that are grown based on multiple bootstrap samples that are drawn (with replacements) from the training set via randomised split selection. RF is a rapid and accurate technique employed for document categorisation and text classification.…”
Section: Random Forestmentioning
confidence: 99%
“…Many machine learning algorithms, including logistic regression (LR), naïve Bayes (NB), support vector machine (SVM), K-nearest neighbor (KNN) and ensemble classifiers (such as bagging and random forest (RF)), have been widely used in text classification studies (Sebastiani, 2002; Liu et al , 2017; Sharmin and Zaman, 2017; Cichosz, 2018; Gravanis et al , 2019). For example, a total of 2000 teachers' posts were collected and coded for constructing six-class classification models based on NB and SVM to classify the teachers' reflective thinking in the online learning environment (Liu et al , 2017).…”
Section: Literature Reviewmentioning
confidence: 99%
“…The third type is known as content analysis approach where tweet text is used to detect spam content. The analysis of text start by Bag-of-Words analysis, a popular approach to identify the k-top words in user groups [8]. Alternatively, studies use n-gram character features, unsupervised learning such as LDA and ensemble approach [9].…”
Section: Related Workmentioning
confidence: 99%