2017
DOI: 10.1108/idd-04-2017-0043
|View full text |Cite
|
Sign up to set email alerts
|

Identifying domain relevant user generated content through noise reduction: a test in a Chinese stock discussion forum

Abstract: Purpose Getting high-quality data by removing the noisy data from the user-generated content (UGC) is the first step toward data mining and effective decision-making based on ubiquitous and unstructured social media data. This paper aims to design a framework for revoking noisy data from UGC. Design/methodology/approach In this paper, the authors consider a classification-based framework to remove the noise from the unstructured UGC in social media community. They treat the noise as the concerned topic non-r… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2018
2018
2019
2019

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(3 citation statements)
references
References 44 publications
0
3
0
Order By: Relevance
“…In particular, there has been evidence that the interaction between sentiment expressed by social and conventional media has strong effect on market variables prediction (Yu et al, 2013;Li et al, 2016;Agarwal et al, 2019). Therefore, it is crucial to identify high-quality relevant content and conduct SA on integrated data, extract collective market sentiment and understand their joint influences (Kearney and Liu, 2014;Yan et al, 2017;Li et al, 2018).…”
Section: Stock Market Lexiconsmentioning
confidence: 99%
“…In particular, there has been evidence that the interaction between sentiment expressed by social and conventional media has strong effect on market variables prediction (Yu et al, 2013;Li et al, 2016;Agarwal et al, 2019). Therefore, it is crucial to identify high-quality relevant content and conduct SA on integrated data, extract collective market sentiment and understand their joint influences (Kearney and Liu, 2014;Yan et al, 2017;Li et al, 2018).…”
Section: Stock Market Lexiconsmentioning
confidence: 99%
“…[1] were first to propose that method with a training dataset of 27356 English SMS phrases. His research was the base of several similar work in Portuguese [8], Turkish [9] and Chinese [10], but never in Arabic nor French. In addition, none of these work is open source, and they didn't share the word embedding models, nor the lexicons or dictionaries.…”
Section: Related Workmentioning
confidence: 99%
“…-For the English language, the 3429 English words from Oxford dictionary 9 which English stop-words are included, the 500 most frequently used words on twitter 10 and a list of 90 frequent sentiment words 11 in tweets [13] were combined. A list of 3501 English words is the result of the previous lists combination, after removing duplications.…”
Section: Lists Of Standard-form Seed-wordsmentioning
confidence: 99%