2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA) 2015
DOI: 10.1109/icmla.2015.22
|View full text |Cite
|
Sign up to set email alerts
|

The Effect of Dataset Size on Training Tweet Sentiment Classifiers

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

2
25
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
3
1

Relationship

1
8

Authors

Journals

citations
Cited by 52 publications
(30 citation statements)
references
References 6 publications
2
25
0
Order By: Relevance
“…• The first dataset is a Twitter sentiment labeled dataset with emojis. It has 6,600 positive and negative Tweets each, a large enough dataset for accurate sentiment analysis (Prusa, Khoshgoftaar, and Seliya 2015). • The second dataset is a Twitter sentiment labeled dataset without emojis.…”
Section: Datasetmentioning
confidence: 99%
“…• The first dataset is a Twitter sentiment labeled dataset with emojis. It has 6,600 positive and negative Tweets each, a large enough dataset for accurate sentiment analysis (Prusa, Khoshgoftaar, and Seliya 2015). • The second dataset is a Twitter sentiment labeled dataset without emojis.…”
Section: Datasetmentioning
confidence: 99%
“…The dataset size is considered a critical property in determining the performance of a machine learning model. Typically, large datasets lead to better classification performance and small datasets may trigger over-fitting [1][2][3]. In practice, however, collecting medical data faces many challenges due to patients' privacy, lack of cases due to rare conditions [4], as well as organizational and legal challenges [5,6].…”
Section: Introductionmentioning
confidence: 99%
“…Establishing a method to find the trend in small datasets is not only of scientific interest but also of practical importance and requires a special care when developing machine learning models. Unfortunately, classification algorithms may perform worse when trained with limited size datasets [2]. This is because small datasets typically contain less details, hence the classification model cannot generalize patterns in training data.…”
Section: Introductionmentioning
confidence: 99%
“…Moreover, the limited number of samples makes it affordable to test different IDS solutions on the complete set without the need to select a small random partition. In fact, even if the complexity of a dataset is important in order to faithfully emulate a real industrial plant, a too large dataset is not properly managed by machine learning algorithms reducing its usability [30]. Thus, evaluation results of different papers could be effectively compared in order to identify the best algorithms without any influence from the selected random data partitions.…”
Section: Introductionmentioning
confidence: 99%