2020
DOI: 10.3390/app10186253
|View full text |Cite
|
Sign up to set email alerts
|

News Classification for Identifying Traffic Incident Points in a Spanish-Speaking Country: A Real-World Case Study of Class Imbalance Learning

Abstract: ‘El Diario de Juárez’ is a local newspaper in a city of 1.5 million Spanish-speaking inhabitants that publishes texts of which citizens read them on both a website and an RSS (Really Simple Syndication) service. This research applies natural-language-processing and machine-learning algorithms to the news provided by the RSS service in order to classify them based on whether they are about a traffic incident or not, with the final intention of notifying citizens where such accidents occur. The classification pr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
8
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 16 publications
(13 citation statements)
references
References 62 publications
0
8
0
Order By: Relevance
“…), synthetic minority oversampling technique (SMOTE), random undersampling, random oversampling and borderline SMOTE. Consequently, the final classifier reached ‹ 10 › a sensitivity of 0.86 and an area under the precision-recall curve of 0.86, which is generally acceptable considering the complexity level of assessing unstructured texts in Spanish (Rivera et al, 2020).…”
Section: Data Balancing Techniques In Crash Severity Prediction Modelingmentioning
confidence: 85%
See 1 more Smart Citation
“…), synthetic minority oversampling technique (SMOTE), random undersampling, random oversampling and borderline SMOTE. Consequently, the final classifier reached ‹ 10 › a sensitivity of 0.86 and an area under the precision-recall curve of 0.86, which is generally acceptable considering the complexity level of assessing unstructured texts in Spanish (Rivera et al, 2020).…”
Section: Data Balancing Techniques In Crash Severity Prediction Modelingmentioning
confidence: 85%
“…However, another disadvantage of oversampling is that it increases the number of training observations, thus increasing the learning time (Cover & Hart, 1967). Gilberto Rivera et al (2020) studied the application of machine-learning algorithms and natural-language processing to the news provided by the RSS service. Their goal was to classify them based on whether they were about a traffic incident or otherwise in order to notify citizens where such accidents had specifically occurred.…”
Section: Data Balancing Techniques In Crash Severity Prediction Modelingmentioning
confidence: 99%
“…In this regard, the literature is quite scarce. Rivera et al propose a classiier for RSS-like feed of news that is able to distinguish and locate traic incidents, in order to timely alert citizens [53]. Similarly, Abid et al propose to train a classiier to recognize events where some form of life loss happened (e.g.…”
Section: Smart Cities: Overview and Frameworkmentioning
confidence: 99%
“…The authors solve such problems by first converting the test cases to numeric vectors using NLP techniques and then applying supervised learning for imbalanced datasets on the resulting vectors. Rivera et al (2020) classify news articles in a local Spanish newspaper as traffic-related or not. The authors first convert the article into a vectorized data using bag-of-words and TF-IDF (term frequency-inverse document frequency) techniques.…”
Section: Related Workmentioning
confidence: 99%
“…The class imbalance is dealt with via different sampling methods. The final classifier achieves a sensitivity of 0.86 (Rivera et. al.…”
Section: Related Workmentioning
confidence: 99%