2018
DOI: 10.1007/s10723-018-9465-z
|View full text |Cite
|
Sign up to set email alerts
|

A Dynamic Spark-based Classification Framework for Imbalanced Big Data

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
10
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
5
4

Relationship

3
6

Authors

Journals

citations
Cited by 25 publications
(10 citation statements)
references
References 22 publications
0
10
0
Order By: Relevance
“…This consisted of 23.7% tweets that were labelled as relevant and 76.3% labelled as irrelevant. Imbalanced data causes well known problems to classification models [48]. We initially tried both oversampling and undersampling techniques to create a balanced training dataset as well as just using the unbalanced data.…”
Section: Methodsmentioning
confidence: 99%
“…This consisted of 23.7% tweets that were labelled as relevant and 76.3% labelled as irrelevant. Imbalanced data causes well known problems to classification models [48]. We initially tried both oversampling and undersampling techniques to create a balanced training dataset as well as just using the unbalanced data.…”
Section: Methodsmentioning
confidence: 99%
“…Regardless of the specific type of machine learning method being used, a significant challenge is the unequal distribution of classes within a data set, referred to as imbalanced learning [32], [50]- [53]. In our problem of interest with skewed class proportions, many observations belong to the safe category (related to as the majority class), and much fewer samples fit in the structural failure group (referred to as the minority class).…”
Section: Proposed Two-stage Framework For Predictive Modeling Anmentioning
confidence: 99%
“…CQNS is a proposed framework for improving and estimating complex queries for relational databases and other types of NoSQL data stores. For this purpose, a unified data model is proposed that uses a suitable environment such as Apache Spark with MongoDB [ 40 – 42 ] to optimize the qualification of the data ingestion process. The CQNS framework transforms each query process received from any dataset to the matched Engine after using Hadoop/HDFS and Hadoop/MapReduce with parallel k-means clustering for processing data without physical transformation data.…”
Section: Related Workmentioning
confidence: 99%