2018
DOI: 10.1007/978-3-319-93372-6_40
|View full text |Cite
|
Sign up to set email alerts
|

Using Random String Classification to Filter and Annotate Automated Accounts

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
22
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
5
2

Relationship

3
4

Authors

Journals

citations
Cited by 16 publications
(22 citation statements)
references
References 13 publications
0
22
0
Order By: Relevance
“…Given these results, we used Logistic Regression for our production model, given that it is simpler and faster. Note that this result entails significantly more training data than we used in earlier research (see [6]), where SVM performed better. Before predicting whether or not a string was random, we first applied several heuristic filters.…”
Section: Feature Engineeringmentioning
confidence: 75%
“…Given these results, we used Logistic Regression for our production model, given that it is simpler and faster. Note that this result entails significantly more training data than we used in earlier research (see [6]), where SVM performed better. Before predicting whether or not a string was random, we first applied several heuristic filters.…”
Section: Feature Engineeringmentioning
confidence: 75%
“…When a higher value of n was used, the model was more accurate for detecting spamming IDs but it took a longer time. Beskow and Carley [22] proposed a randomly generated user ID detection method based on its randomness of strings in user IDs. Many fake user accounts are randomly generated and such randomly generated IDs are likely to have rare combinations of characters.…”
Section: B Feature Based Approachesmentioning
confidence: 99%
“…The other is feature-based methods which extract features to detect suspicious IDs. One of the latest methods is the n-gram-based randomly generated ID detection method using term frequency-inverse document frequency (TF-IDF), proposed by Beskow and Carley [22]. However, their method suffers from the curse of dimensionality because the feature dimension increases exponentially with increasing n in n-gram-based approaches.…”
Section: Introductionmentioning
confidence: 99%
“…Supervised models include traditional machine learning with SVM (Lee and Kim 2014), Naïve Bayes (Chen, Guan, and Su 2014), and Random Forest (Ferrara et al 2016) models trained on features extracted from Twitter's tweet and user objects. Other methods have attempted to classify accounts based only on their text (Kudugunta and Ferrara 2018) or their screen name (Beskow and Carley 2018c). Several of the available models like Botometer (Davis et al 2016) and Bot-Hunter (Beskow and Carley 2018b) are classic supervised machine learning models.…”
Section: Previous Work In Bot Detectionmentioning
confidence: 99%