2018 3rd International Conference on Computer Science and Engineering (UBMK) 2018
DOI: 10.1109/ubmk.2018.8566451
|View full text |Cite
|
Sign up to set email alerts
|

Data Feature Selection Methods on Distributed Big Data Processing Platforms

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
5
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 12 publications
(5 citation statements)
references
References 0 publications
0
5
0
Order By: Relevance
“…This is done so that all features considered have a similar dynamic range rather than one feature dominating due to its large dynamic range [17]. The second step is applying Synthetic Minority Oversampling TEchnique (SMOTE) to tackle the class imbalance problem often 2) Feature selection: The goal at this stage is to reduce the number of features inputted to the ML model to help reduce its computational complexity while still maintaining or even improving its detection performance [31]. To achieve this, information gain method is used to select the relevant features by ranking them according to the amount of information (in bits) they provide about the class [32].…”
Section: A Proposed Approach Descriptionmentioning
confidence: 99%
“…This is done so that all features considered have a similar dynamic range rather than one feature dominating due to its large dynamic range [17]. The second step is applying Synthetic Minority Oversampling TEchnique (SMOTE) to tackle the class imbalance problem often 2) Feature selection: The goal at this stage is to reduce the number of features inputted to the ML model to help reduce its computational complexity while still maintaining or even improving its detection performance [31]. To achieve this, information gain method is used to select the relevant features by ranking them according to the amount of information (in bits) they provide about the class [32].…”
Section: A Proposed Approach Descriptionmentioning
confidence: 99%
“…The second phase, a sub-set of features are selected using different feature selection mechanisms to be given to the classification model as input. This is done in an attempt to reduce the complexity of the classification model and decrease its training time without sacrificing its performance [30]. This is particularly important when dealing with large scale systems generating big data [30].…”
Section: B Proposed Approach Applicationmentioning
confidence: 99%
“…This is done in an attempt to reduce the complexity of the classification model and decrease its training time without sacrificing its performance [30]. This is particularly important when dealing with large scale systems generating big data [30]. Three different feature selection mechanisms are considered in this work representing three different categories of feature selection algorithms.…”
Section: B Proposed Approach Applicationmentioning
confidence: 99%
“…This work compares between two different feature selection techniques, namely information gain-based and correlationbased feature selection, and explores their effect on the models' detection performance and time complexity. This is particularly relevant when designing ML models for large scale systems that generate high dimensional data [38].…”
Section: B Feature Selectionmentioning
confidence: 99%
“…The second stage of the proposed framework is conducting a feature selection process to reduce the number of features needed for the ML classification model. This is done to reduce the time complexity of the classification model and consequently decrease its training time without sacrificing its performance [38]. With that in mind, two different methods are compared within this stage of the framework.…”
Section: ) Bayesian Optimizationmentioning
confidence: 99%