2018
DOI: 10.1108/idd-02-2018-0002
|View full text |Cite
|
Sign up to set email alerts
|

Big Data analytics for prediction: parallel processing of the big learning base with the possibility of improving the final result of the prediction

Abstract: Purpose This paper aims to solve the problems of big data analytics for prediction including volume, veracity and velocity by improving the prediction result to an acceptable level and in the shortest possible time. Design/methodology/approach This paper is divided into two parts. The first one is to improve the result of the prediction. In this part, two ideas are proposed: the double pruning enhanced random forest algorithm and extracting a shared learning base from the stratified random sampling method to … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
6
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
1
1

Relationship

1
5

Authors

Journals

citations
Cited by 9 publications
(7 citation statements)
references
References 37 publications
1
6
0
Order By: Relevance
“…Test 5. Presents the work presented by Djafri et al (2018), where he classified big data (KDD Cup 2012) using representative learning base and the classical random forests (CRF), as well as using the random forest classifier which he improved (IRF), but in a completely different method from the method proposed in this work. The results obtained are shown in Table 6:…”
Section: Classification Results Evaluationmentioning
confidence: 99%
See 2 more Smart Citations
“…Test 5. Presents the work presented by Djafri et al (2018), where he classified big data (KDD Cup 2012) using representative learning base and the classical random forests (CRF), as well as using the random forest classifier which he improved (IRF), but in a completely different method from the method proposed in this work. The results obtained are shown in Table 6:…”
Section: Classification Results Evaluationmentioning
confidence: 99%
“…Table 8 shows the final classification result of the original dataset (KDD Cup 2012) and the representative learning base (RLB) using Classical Random Forests (CRF) and Improved Random Forests (IRF) (Djafri et al , 2018).…”
Section: Experiments Results and Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…For such an amount of data, the design of the system handling the communication between the data source and the different parts of the analytic pipeline can create a major bottleneck. In addition to needing the right choice of software/hardware architecture, routine optimisation and partial processing (only selecting the fraction of data representative of the whole) are common workarounds [43]. • When we cannot process any faster, we try to process many things at once: A single computational node, even when using multiple cores (parallel computation) can only process as fast as its hardware enables; even considering the right data subset to process and the optimal algorithmic implementation, this threshold would constitute a theoretical upper limit.…”
Section: A Data and Data Streamsmentioning
confidence: 99%
“…While some companies use data to create a competitive advantage, many businesses fall short of gaining real insights from their data. This can mainly be ascribed to big data requiring powerful technologies, computer processing power, skilled personnel and predictive models to crunch enormous amounts of data (Djafri et al, 2018; Gupta, 2018).…”
Section: Artificial Intelligence In the Retail Value Chainmentioning
confidence: 99%