A survey on Short text analysis in Web

Rafeeque, P C; Sendhilkumar, S.

doi:10.1109/icoac.2011.6165203

Cited by 17 publications

(11 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…As social media posts are often short and may include mis-spelled words or irrelevant characters (such as emojis), social media text documents share an extremely low number of overlapping terms within a collection of posts. To address the sparsity problem, scholars have suggested alternative methods, such as LDA extension to author-topic model, and the dual LDA approach that relies on external knowledge bases like Wikipedia (Atefeh, and Khreich 2015;Nugroho et al 2020;Rafeeque and Sendhilkumar 2011).…”

Section: Latent Dirichlet Allocation (Lda)mentioning

confidence: 99%

Cross-platform comparison of framed topics in Twitter and Weibo: machine learning approaches to social media text mining

Yang

Hsu

Löfgren

et al. 2021

Soc. Netw. Anal. Min.

View full text Add to dashboard Cite

While the salience of social media platforms on modern interactive communication between diverse social actors has been demonstrated, less academic attention has been paid to comparisons between framed topics and user interactions across social media platforms, such as Twitter and Weibo. This article suggests text mining and natural language processing tools for cross-platform comparative social media studies, based on Latent Dirichlet Allocation (LDA) and social network analysis. This study illustrates how the suggested topic models and data processing algorithms can be applied to a real-life example (U.S.-China trade war discourse on social media), and experimented the methods on social media text mining data, revealing differences between user interactions on Twitter, predominantly "Western," and Weibo, largely representing Chinesespeaking users. We discuss the strengths and weaknesses of the suggested machine learning algorithms for comparative social media studies.

show abstract

Section: Latent Dirichlet Allocation (Lda)mentioning

confidence: 99%

Cross-platform comparison of framed topics in Twitter and Weibo: machine learning approaches to social media text mining

Yang

Hsu

Löfgren

et al. 2021

Soc. Netw. Anal. Min.

View full text Add to dashboard Cite

show abstract

“…External knowledge can be taken from ontologies (wordNet, Wikipedia) [12,13] or from large scale datasets on which topic models techniques are applied (LDA, Latent Semantic Indexing [14]...). Song et al in [15] and Rafeeque et al in [16] summarize these different techniques and show some of their usages in short text classification. Several reduction methods exist: feature abstraction, feature selection and LDA.…”

Section: Related Workmentioning

confidence: 99%

Introducing Semantics in Short Text Classification

Bouaziz¹,

Pereira²,

Dartigues-Pallez³

et al. 2018

Computational Linguistics and Intelligent Text Processing

View full text Add to dashboard Cite

To overcome short text classification issues due to shortness and sparseness, the enrichment process is classically proposed: topics (word clusters) are extracted from external knowledge sources using Latent Dirichlet Allocation. All the words, associated to topics which encompass short text words, are added to the initial short text content. We propose (i) an explicit representation of a two-level enrichment method in which the enrichment is considered either with respect to each word in the text or to the global semantic meaning of the short text and (ii) a new semantic Random Forest kind in which semantic relations between features are taken into account at node level rather than at tree level as it was recently proposed in the literature to avoid potential tree correlation. We demonstrate that our enrichment method is valid not only for Random Forest based methods but also for other methods like MaxEnt, SVM and Naive Bayes.

show abstract

“…Our proposal is based on the existence of a mechanism that is able to determine the topics discussed by the pieces of information that are exchanged in OSNs (i.e., the messages). Topics here can be categories predefined by the underlying OSN infrastructure [54]; usergenerated tags like Flickr categories [53]; or categories or tags extracted from images [66,80], videos [3], geolocation information [45], or text [57,74,12].…”

Section: Topic Analysismentioning

confidence: 99%

“…With regard to topic extraction from text, current research in the field of NLP has made advancements on analysing short messages present in OSN, microblogs, etc. For a review of these works see [57]. Specifically, there are some NLP proposals endowed with new textmining and analysis techniques that analyse short and informal messages with acceptable accuracy [74].…”

Section: Topic Analysismentioning

confidence: 99%

Implicit Contextual Integrity in Online Social Networks

Criado¹,

Such²

2015

Information Sciences

View full text Add to dashboard Cite

Many real incidents demonstrate that users of Online Social Networks need mechanisms that help them manage their interactions by increasing the awareness of the different contexts that coexist in Online Social Networks and preventing them from exchanging inappropriate information in those contexts or disseminating sensitive information from some contexts to others. Contextual integrity is a privacy theory that conceptualises the appropriateness of information sharing based on the contexts in which this information is to be shared. Computational models of Contextual Integrity assume the existence of well-defined contexts, in which individuals enact pre-defined roles and information sharing is governed by an explicit set of norms. However, contexts in Online Social Networks are known to be implicit, unknown a priori and ever changing; users relationships are constantly evolving; and the information sharing norms are implicit. This makes current Contextual Integrity models not suitable for Online Social Networks. In this paper, we propose the first computational model of Implicit Contextual Integrity, presenting an information model and an Information Assistant Agent that uses the information model to learn implicit contexts, relationships and the information sharing norms to help users avoid inappropriate information exchanges and undesired information disseminations. Through an experimental evaluation, we validate the properties of Information Assistant Agents, which are shown to: infer the information sharing norms even if a small proportion of the users follow the norms and in presence of malicious users; help reduce the exchange of inappropriate information and the dissemination of sensitive information with only a partial view of the system and the information received and sent by their users; and minimise the burden to the users in terms of raising unnecessary alerts.Comment: Authors Version of the paper accepted for publication in the Information Sciences journal (http://www.journals.elsevier.com/information-sciences/

show abstract

A survey on Short text analysis in Web

Cited by 17 publications

References 25 publications

Cross-platform comparison of framed topics in Twitter and Weibo: machine learning approaches to social media text mining

Cross-platform comparison of framed topics in Twitter and Weibo: machine learning approaches to social media text mining

Introducing Semantics in Short Text Classification

Implicit Contextual Integrity in Online Social Networks

Contact Info

Product

Resources

About