Incrementally optimized decision tree for noisy big data

Yang, Hang; Fong, Simon

doi:10.1145/2351316.2351322

Cited by 31 publications

(15 citation statements)

References 14 publications

(22 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Alternative approaches, such as NIP-H e NIP-N, use Gaussian approximations instead of Hoeffding bounds in order to compute confidence intervals. Several extensions of VFDT have been proposed, also taking into account non-stationary data sources -see, e.g., [10], [9], [2], [35], [27], [15], [19], [21], [11], [34], [20], [29], [8]. All these methods are based on the classical Hoeffding bound [14]: after m independent observations of a random variable taking values in a real interval of size R, with probability at least 1 − δ the true mean does not differ from the sample mean by more than…”

Section: Introductionmentioning

confidence: 99%

Confidence Decision Trees via Online and Active Learning for Streaming Data

Rosa¹,

Cesa-Bianchi²

2017

jair

View full text Add to dashboard Cite

Decision tree classifiers are a widely used tool in data stream mining. The use of confidence intervals to estimate the gain associated with each split leads to very effective methods, like the popular Hoeffding tree algorithm. From a statistical viewpoint, the analysis of decision tree classifiers in a streaming setting requires knowing when enough new information has been collected to justify splitting a leaf. Although some of the issues in the statistical analysis of Hoeffding trees have been already clarified, a general and rigorous study of confidence intervals for splitting criteria is missing. We fill this gap by deriving accurate confidence intervals to estimate the splitting gain in decision tree learning with respect to three criteria: entropy, Gini index, and a third index proposed by Kearns and Mansour. Our confidence intervals depend in a more detailed way on the tree parameters. We also extend our confidence analysis to a selective sampling setting, in which the decision tree learner adaptively decides which labels to query in the stream. We furnish theoretical guarantee bounding the probability that the classification is nonoptimal learning the decision tree via our selective sampling strategy. Experiments on real and synthetic data in a streaming setting show that our trees are indeed more accurate than trees with the same number of leaves generated by other techniques and our active learning module permits to save labeling cost. In addition, comparing our labeling strategy with recent methods, we show that our approach is more robust and consistent respect all the other techniques applied to incremental decision trees. 1

show abstract

Section: Introductionmentioning

confidence: 99%

Confidence Decision Trees via Online and Active Learning for Streaming Data

Rosa¹,

Cesa-Bianchi²

2017

jair

View full text Add to dashboard Cite

show abstract

“…To update the machine learning parameters, incremental learning can be used on the newly captured data rather than again training with both new and old data. Incremental leaning provides an effective way for adapting algorithms on noisy (Yang and Fong 2012) and spatially big data (Wang et al 2014).…”

Section: Algorithmic Developmentmentioning

confidence: 99%

Affective design using machine learning: a survey and its prospect of conjoining big data

Chan

Kwong

Clark

et al. 2018

International Journal of Computer Integrated Manufacturing

View full text Add to dashboard Cite

Customer satisfaction in purchasing new products is an important issue that needs to be addressed in today's competitive markets. Consumers not only need to be solely satisfied with the functional requirements of a product, and they are also concerned with the affective needs and aesthetic appreciation of the product. A product with good affective design excites consumer emotional feelings so as to buy the product. However, affective design often involves complex and multi-dimensional problems for modelling and maximising affective satisfaction of customers. Machine learning is commonly used to model and maximise the affective satisfaction, since it is effective in modelling nonlinear patterns when numerical data relevant to the patterns is available. This article presents a survey of commonly used machine learning approaches for affective design when two data streams namely traditional survey data and modern big data are used. A classification of machine learning technologies is first provided which is developed using traditional survey data for affective design. The limitations and advantages of each machine learning technology are also discussed and we summarize the uses of machine learning technologies for affective design. This review article is useful for those who use machine learning technologies for affective design. The limitations of using traditional survey data are then discussed which is time consuming to collect and cannot fully cover all the affective domains for product development. Nowadays, big data related to affective design can be captured from social media. The prospects and challenges in using big data are discussed so as to enhance affective design, in which very limited research has so far been attempted. This article provides guidelines for researchers who are interested in exploring big data and machine learning technologies for affective design.

show abstract

“…Experimental results showed that this method is faster than current decision tree algorithm on large-scale problems. Yang et al [111] proposed a fast incremental optimization decision tree algorithm for large data processing with noise. Compared with former decision tree data mining algorithm, this method has a major advantage on real-time speed for data mining, which is quite suitable when dealing with continuous data from mobile devices.…”

Section: Big Data Classificationmentioning

confidence: 99%

A Survey on Machine Learning‐Based Mobile Big Data Analysis: Challenges and Applications

Xie

Song

et al. 2018

Wireless Communications and Mobile Computing

View full text Add to dashboard Cite

This paper attempts to identify the requirement and the development of machine learning-based mobile big data (MBD) analysis through discussing the insights of challenges in the mobile big data. Furthermore, it reviews the state-of-the-art applications of data analysis in the area of MBD. Firstly, we introduce the development of MBD. Secondly, the frequently applied data analysis methods are reviewed. Three typical applications of MBD analysis, namely, wireless channel modeling, human online and offline behavior analysis, and speech recognition in the Internet of Vehicles, are introduced, respectively. Finally, we summarize the main challenges and future development directions of mobile big data analysis.

show abstract

Incrementally optimized decision tree for noisy big data

Cited by 31 publications

References 14 publications

Confidence Decision Trees via Online and Active Learning for Streaming Data

Confidence Decision Trees via Online and Active Learning for Streaming Data

Affective design using machine learning: a survey and its prospect of conjoining big data

A Survey on Machine Learning‐Based Mobile Big Data Analysis: Challenges and Applications

Contact Info

Product

Resources

About