Do We Really Need to Catch Them All? A New User-Guided Social Media Crawling Method

Erlandsson, Fredrik; Bródka, Piotr; Boldt, Martin; Johnson, Henric

doi:10.3390/e19120686

Cited by 6 publications

(2 citation statements)

References 27 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In this way, the crawling process could crawl most of the newly produced content with limited resources (and taking into account the access restrictions of the SM source). In [25], a user-guided Social Media crawling method was proposed. The goal was not to crawl the entire SM platform (or extract the full set of users) but instead to obtain a sample of posts or submissions that are statistically representative of the entire dataset.…”

Section: A Real-time Crawling Of Social Mediamentioning

confidence: 99%

Real-Time Focused Extraction of Social Media Users

2022

View full text Add to dashboard Cite

In this paper, we explore a real-time automation challenge: the problem of focused extraction of Social Media users. This challenge can be seen as a special form of focused crawling where the main target is to detect users with certain patterns. Given a specific user profile, the task consists of rapidly ingesting Social Media data and early detecting target users. This is a real-time intelligent automation task that has numerous applications in domains such as safety, health or marketing. The volume and dynamics of Social Media contents demand efficient real-time solutions able to predict which users are worth to explore. To meet this aim, we propose and evaluate several methods that effectively allow us to harvest relevant users. Even with little contextual information (e.g., a single user submission), our methods quickly focus on the most promising users. We also developed a distributed microservice architecture that supports real-time parallel extraction of Social Media users. This modular architecture scales up in clusters of computers and it can be easily adapted for user extraction in multiple domains and Social Media sources. Our experiments suggest that some of the proposed prioritisation methods, which work with minimal user context, are effective at rapidly focusing on the most relevant users. These methods perform satisfactorily with huge volumes of users and interactions and lead to harvest ratios 2 to 9 times higher than those achieved by random prioritisation.

show abstract

Section: A Real-time Crawling Of Social Mediamentioning

confidence: 99%

Real-Time Focused Extraction of Social Media Users

2022

View full text Add to dashboard Cite

show abstract

“…al. [6] and is publicly available at Harvard Dataverse [5]. The data from these pages were parsed and for each post the corresponding likes and comments were extracted.…”

Section: Dataset and Network Modelmentioning

confidence: 99%

Seed Selection for Information Cascade in Multilayer Networks

Erlandsson

Bródka

Borg

2017

Complex Networks &Amp; Their Applications VI

Self Cite

View full text Add to dashboard Cite

Information spreading is an interesting field in the domain of online social media. In this work, we are investigating how well different seed selection strategies affect the spreading processes simulated using independent cascade model on eighteen multilayer social networks. Fifteen networks are built based on the user interaction data extracted from Facebook public pages and tree of them are multilayer networks downloaded from public repository (two of them being Twitter networks). The results indicate that various state of the art seed selection strategies for singlelayer networks like K-Shell or VoteRank do not perform so well on multilayer networks and are outperformed by Degree Centrality.

show abstract