Twitter mining for ontology-based domain discovery incorporating machine learning

Abu-Salih, Bilal; Wongthongtham, Pornpit; Chan, Kit Yan

doi:10.1108/jkm-11-2016-0489

Cited by 72 publications

(52 citation statements)

References 60 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…In this context, a thread of efforts has steered towards extracting knowledge from UGC to inform actionable intelligence (Abu-Salih, Wongthongtham, & Chan, 2018;Hultgren, Jennex, Persano,&Ornatowski,2016).Asnoted,OSNshavealreadybeenextensivelyusedasapowerful tooltopromoteknowledgeextractionandmanagementinseveraldomains (Arularasan,Suresh,& Seerangan,2018;Kasemsap,2019;Nishikant,PrabinKumar,&ShashiKant,2018).Givensuchan impact,understandingwaystobroadenthisscopeandextradatafromnewsourcessuchasUGCisof 104 interesttomanypractitionersandresearchersalike.Infact,eventhoughidentifying,reviewing,and interpretingsocialcontentconsumessubstantialtimeandeffort (Chang,Diaz,&Hung,2015),itstill attractswideinterestduetothepotentialtoapplyKMtoobtainhighqualitycontent,andactionable intelligenceinmanydisciplinesincludingpolitics (Cruz,2019),e-commerce(Schaupp&Bélanger, 2019),e-learning (Hosseingholizadeh,Sharif,&Kouhsari,2018),andhealthcare (Surendro,Satya, &Yodihartomo,2018).Othereffortshaveprovidedunconventionalandadvancedperceptionsto framethisconstantgrowthofUGC,alongsideotherBigDataislands.Forexample, Jennex(2017) presentedarevisedversionofthetraditionalKMpyramidthatincorporatesBigData,Internetof Things(IoT)andBusinessIntelligence(BI)toprovideanoverarchingparadigmtowardbetterdecision supportstrategies.…”

Section: Incorporation Of Knowledge Managementmentioning

confidence: 99%

Unlocking Social Media and User Generated Content as a Data Source for Knowledge Management

Meneghello¹,

Thompson

Lee

et al. 2020

International Journal of Knowledge Management

Self Cite

View full text Add to dashboard Cite

The pervasiveness of social media and user-generated content has triggered an exponential increase in global data. However, due to collection and extraction challenges, data in embedded comments, reviews and testimonials are largely inaccessible to a knowledge management system. This article describes a KM framework for the end-to-end knowledge management and value extraction from such content. This framework embodies solutions to unlock the potential of UGC as a rich, real-time data source. Three contributions are described in this article. First, a method for automatically navigating webpages to expose UGC for collection is presented. This is evaluated using browser emulation integrated with automated collection. Second, a method for collecting data without any a priori knowledge of the sites is introduced. Finally, a new testbed is developed to reflect the current state of internet sites and shared publicly to encourage future research. The discussion benchmarks the new algorithm alongside existing techniques, providing evidence of the increased amount of UGC data extracted.

show abstract

Section: Incorporation Of Knowledge Managementmentioning

confidence: 99%

Unlocking Social Media and User Generated Content as a Data Source for Knowledge Management

Meneghello¹,

Thompson

Lee

et al. 2020

International Journal of Knowledge Management

Self Cite

View full text Add to dashboard Cite

show abstract

“…Random Forest Decision Tree classification as the major learning algorithm implemented in this undertaking is further utilized as a training data and test results to predict the MMORS river condition with its corresponding water pollution level classification indicated as -Excellent‖, -Good‖, -Poor‖, -Very Poor, and -Worst‖ This section describes the different metrics used by the researcher in evaluating the classifier model performance [8]; its effectiveness and the quality of its prediction. Several tests of data with known water quality parameter values are used to test the accuracy of the generated sample by distinguishing the reliability of the data and their validity in accordance to the comparison of an observed accuracy with an expected accuracy rate that is likely to meet based on the Confusion Matrix [9]. The classifier can also be evaluated in terms of Precision, Recall, and F-measure and the assessment of interrater-reliability [10] .Cohen's Kappa is used which is shown in Table IV.…”

Section: E Prediction and Validationmentioning

confidence: 99%

Predicting River Pollution Using Random Forest Decision Tree with GIS Model: A Case Study of MMORS, Philippines

Victoriano¹,

Lacatan²,

Vinluan³

2020

IJESD

View full text Add to dashboard Cite

This study aims to predict the pollution level that threatens the Marilao-Meycauayan-Obando River System (MMORS), located in the province of Bulacan, Philippines. The inhabitants of this area are now being exposed to pollution. Contamination of this waterway comes from both formal and informal industries, such as a used lead-acid battery, open dumpsites metal refining, and other toxic metals. Using various water quality parameters like Dissolved Oxygen (DO), Potential of Hydrogen (pH), Biochemical Oxygen Demand (BOD), Total Suspended Solids (TSS), Nitrate, Phosphate, and Coliform are the basis for predicting the pollution level. Base on the sample data collected from January 2013 to May 2018. These are used as a training data and test results to predict the river condition with its corresponding pollution level classification indicated. Random Forest decision tree model got an accuracy of 99.38% with a Kappa value of 0.8303 interpreted as "Strong" in terms of the level of agreement and GIS model shows the heat map of the different water quality parameter and Water Quality Index (WQI) spatial distribution, the majority of the sampling station are greatly polluted provided that they have "Poor" and "Very Poor." Index Terms-Machine learning, river pollution, random forest and GIS.

show abstract

“…It is also directly linked to the capability to discard user's messages that are classi ed as spam i.e. unsolicited and repeated junk messages (Abu-Salih et al, 2020;Abu-Salih et al, 2018;Abu-Salih, et al, 2019b;. These tweets come usually from bots and have a malicious intention to create rumours and chaos (Shin et al, 2017).…”

Section: Related Workmentioning

confidence: 99%

“…A SA system is highly sensitive to the domain in which the data used to train are extracted. It can obtain poor results if the training dataset is not political (Abu-Salih et al, 2018;. Due to the lack of a sentiment lexicon for non-English languages, the creation of a new polarity lexicon is decided for the Spanish political event issuing from two different sources.…”

Section: -New Polarity Political Lexiconmentioning

confidence: 99%

Inferring the votes in a new political landscape. The case of the 2019 Spanish Presidential elections.

Grimaldi

Díaz

Arboleda

2020

Preprint

View full text Add to dashboard Cite

Abstract The avalanche of personal and social data circulating in Online Social Networks over the past 10 years has attracted a great deal of interest from Scholars and Practitioners who seek to analyse not only their value, but also their limits. Predicting election results using Twitter data is an example of how data can directly influence the politic domain and it also serves an appealing research topic. This article aims to predict the results of the 2019 Spanish Presidential election and the voting share of each candidate, using Tweeter. The method combines sentiment analysis and volume information and compares the performance of five Machine Learning algorithms. Several data scrutiny uncertainties arose that hindered the prediction of the outcome. Consequently, the method develops a political lexicon-based framework to measure the sentiments of online users. Indeed, an accurate understanding of the contextual content of the tweets posted was vital in this work. Our results correctly ranked the candidates and determined the winner by means of a better prediction of votes than official research institutes.

show abstract

Twitter mining for ontology-based domain discovery incorporating machine learning

Cited by 72 publications

References 60 publications

Unlocking Social Media and User Generated Content as a Data Source for Knowledge Management

Unlocking Social Media and User Generated Content as a Data Source for Knowledge Management

Predicting River Pollution Using Random Forest Decision Tree with GIS Model: A Case Study of MMORS, Philippines

Inferring the votes in a new political landscape. The case of the 2019 Spanish Presidential elections.

Contact Info

Product

Resources

About