Stochastic gradient boosted distributed decision trees

Ye, Jerry; Chow, Jyh-Herng; Jiang, Chen; Zheng, Zhaohui

doi:10.1145/1645953.1646301

Cited by 265 publications

(131 citation statements)

References 10 publications

Supporting

Mentioning

125

Contrasting

Order By: Relevance

“…We test the performance of the model over a dataset containing eight million question instances, obtained after filtering out questions whose askers' demographics information is missing. We train our machine learning model using gradient boosted decision trees [10,27] and test on our data via 10-fold cross-validation. 15 The 10 most important features, as provided by the learning tool, are shown in Table 9.…”

Section: Attitude Predictionmentioning

confidence: 99%

A large-scale sentiment analysis for Yahoo! answers

Küçüktunç

Cambazoğlu

Weber

et al. 2012

Proceedings of the Fifth ACM International Conference on Web Search and Data Mining

119

View full text Add to dashboard Cite

Sentiment extraction from online web documents has recently been an active research topic due to its potential use in commercial applications. By sentiment analysis, we refer to the problem of assigning a quantitative positive/negative mood to a short bit of text. Most studies in this area are limited to the identification of sentiments and do not investigate the interplay between sentiments and other factors. In this work, we use a sentiment extraction tool to investigate the influence of factors such as gender, age, education level, the topic at hand, or even the time of the day on sentiments in the context of a large online question answering site. We start our analysis by looking at direct correlations, e.g., we observe more positive sentiments on weekends, very neutral ones in the Science & Mathematics topic, a trend for younger people to express stronger sentiments, or people in military bases to ask the most neutral questions. We then extend this basic analysis by investigating how properties of the (asker, answerer) pair affect the sentiment present in the answer. Among other things, we observe a dependence on the pairing of some inferred attributes estimated by a user's ZIP code. We also show that the best answers differ in their sentiments from other answers, e.g., in the Business & Finance topic, best answers tend to have a more neutral sentiment than other answers. Finally, we report results for the task of predicting the attitude that a question will provoke in answers. We believe that understanding factors influencing the mood of users is not only interesting from a sociological point of view, but also has applications in advertising, recommendation, and search.

show abstract

Section: Attitude Predictionmentioning

confidence: 99%

A large-scale sentiment analysis for Yahoo! answers

Küçüktunç

Cambazoğlu

Weber

et al. 2012

Proceedings of the Fifth ACM International Conference on Web Search and Data Mining

119

View full text Add to dashboard Cite

show abstract

“…Overfitting normally does not happen on the training exam the notion of margins in the case of boosting. The margin ( ) ( , ) x y is based on the votes ( ) t h x along with t  denoting the weights for all hypotheses [29].…”

Section: Combining Hypothesis With Boostingmentioning

confidence: 99%

Friedman and Wilcoxon Evaluations Comparing SVM, Bagging, Boosting, K-NN and Decision Tree Classifiers

Biju

Prashanth²

2017

Journal of Applied Computer Science Methods

View full text Add to dashboard Cite

This paper describes a number of experiments to compare and validate the performance of machine learning classifiers. Creating machine learning models for data with wide varieties has huge applications in predictive modelling across multiple domain of science. This work reviews state of the art techniques in machine learning classifiers methods with several extent of magnitude in statistics and key findings that will be helpful in establishing best methodological practices for class predictions. Comprehensive comparative review analysis with statistical validations for various machine learning algorithm for SVM, Bagging, Boosting, Decision Trees and Nearest Neighborhood algorithm on multiple data sets is carried out. Focus on the statistical analysis of the results using Friedman-Test and Wilcoxon Test as well as other interpretative metrics like classification rate, ROC, F-measure are evaluated to benchmark results.

show abstract

“…The effectiveness of tree-based ensembles for learning to rank has been widely demonstrated: an example is the family of gradient-boosted regression trees (GBRTs) [9,3,10,2]. In this context, our work uses LambdaMART [3], which is the combination of LambdaRank [11] and MART [12]-a class of boosting algorithms that performs gradient descent using regression trees.…”

Section: Lambdamartmentioning

confidence: 99%

Training Efficient Tree-Based Models for Document Ranking

Asadi

Lin

2013

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Abstract. Gradient-boosted regression trees (GBRTs) have proven to be an effective solution to the learning-to-rank problem. This work proposes and evaluates techniques for training GBRTs that have efficient runtime characteristics. Our approach is based on the simple idea that compact, shallow, and balanced trees yield faster predictions: thus, it makes sense to incorporate some notion of execution cost during training to "encourage" trees with these topological characteristics. We propose two strategies for accomplishing this: the first, by directly modifying the node splitting criterion during tree induction, and the second, by stagewise tree pruning. Experiments on a standard learning-to-rank dataset show that the pruning approach is superior; one balanced setting yields an approximately 40% decrease in prediction latency with minimal reduction in output quality as measured by NDCG.

show abstract

Stochastic gradient boosted distributed decision trees

Cited by 265 publications

References 10 publications

A large-scale sentiment analysis for Yahoo! answers

A large-scale sentiment analysis for Yahoo! answers

Friedman and Wilcoxon Evaluations Comparing SVM, Bagging, Boosting, K-NN and Decision Tree Classifiers

Training Efficient Tree-Based Models for Document Ranking

Contact Info

Product

Resources

About