Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Progra 2012
DOI: 10.1145/2351316.2351322
|View full text |Cite
|
Sign up to set email alerts
|

Incrementally optimized decision tree for noisy big data

Abstract: How to extract meaningful information from big data has been a popular open problem. Decision tree, which has a high degree of knowledge interpretation, has been favored in many real world applications. However noisy values commonly exist in high-speed data streams, e.g. real-time online data feeds that are prone to interference. When processing big data, it is hard to implement pre-processing and sampling in full batches. To solve this tradeoff, this paper proposes a new incremental decision tree algorithm so… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
15
0

Year Published

2014
2014
2021
2021

Publication Types

Select...
5
4
1

Relationship

0
10

Authors

Journals

citations
Cited by 31 publications
(15 citation statements)
references
References 14 publications
(22 reference statements)
0
15
0
Order By: Relevance
“…Alternative approaches, such as NIP-H e NIP-N, use Gaussian approximations instead of Hoeffding bounds in order to compute confidence intervals. Several extensions of VFDT have been proposed, also taking into account non-stationary data sources -see, e.g., [10], [9], [2], [35], [27], [15], [19], [21], [11], [34], [20], [29], [8]. All these methods are based on the classical Hoeffding bound [14]: after m independent observations of a random variable taking values in a real interval of size R, with probability at least 1 − δ the true mean does not differ from the sample mean by more than…”
Section: Introductionmentioning
confidence: 99%
“…Alternative approaches, such as NIP-H e NIP-N, use Gaussian approximations instead of Hoeffding bounds in order to compute confidence intervals. Several extensions of VFDT have been proposed, also taking into account non-stationary data sources -see, e.g., [10], [9], [2], [35], [27], [15], [19], [21], [11], [34], [20], [29], [8]. All these methods are based on the classical Hoeffding bound [14]: after m independent observations of a random variable taking values in a real interval of size R, with probability at least 1 − δ the true mean does not differ from the sample mean by more than…”
Section: Introductionmentioning
confidence: 99%
“…To update the machine learning parameters, incremental learning can be used on the newly captured data rather than again training with both new and old data. Incremental leaning provides an effective way for adapting algorithms on noisy (Yang and Fong 2012) and spatially big data (Wang et al 2014).…”
Section: Algorithmic Developmentmentioning
confidence: 99%
“…Experimental results showed that this method is faster than current decision tree algorithm on large-scale problems. Yang et al [111] proposed a fast incremental optimization decision tree algorithm for large data processing with noise. Compared with former decision tree data mining algorithm, this method has a major advantage on real-time speed for data mining, which is quite suitable when dealing with continuous data from mobile devices.…”
Section: Big Data Classificationmentioning
confidence: 99%