CatBoost for Big Data: an Interdisciplinary Review

Hancock, John; Khoshgoftaar, Taghi M.

doi:10.21203/rs.3.rs-54646/v2

Cited by 25 publications

(27 citation statements)

References 54 publications

(131 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Trees (GBDT's) machine learning ensemble techniques [56]. All analysis was performed using R statistical language with Caret, XGBoost, SHAPforxgboost and CatBoost libraries.…”

Section: Discussionmentioning

confidence: 99%

Development and validation of blood-based predictive biomarkers for response to PD-(L)-1 checkpoint inhibitors: evidence of a universal systemic core of 3D immunogenetic profiling across multiple oncological indications

Hunter¹,

Dizfouli²,

Koutsothanasi³

et al. 2021

Preprint

View full text Add to dashboard Cite

Unprecedented advantages in cancer treatment with immune checkpoint inhibitors (ICI) remain limited to a subset of patients. Systemic analyses of the regulatory 3D genome architecture linked to individual epigenetics and immunogenetic controls associated with tumour immune evasion mechanisms and immune checkpoint pathways reveals a highly prevalent patient molecular profiles predictive of response to PD-(L)1 immune checkpoint inhibitors. A clinical blood test based on the set of 8 3D genomic biomarkers has been developed and validated on several independent cancer patient cohorts to predict response to PD-(L)1 immune checkpoint inhibition. The predictive 8 biomarker set is derived from prospective observational clinical trials, representing 229 treatments with Pembrolizumab, Atezolizumab, Durvalumab, in diverse indications: melanoma, non-small cell lung, urethral, hepatocellular, bladder, prostate cancer, head and neck, vulvar, colon, breast, bone, brain, lymphoma, larynx cancer, and cervix cancers. The 3D genomic 8 biomarker panel for response to immune checkpoint therapy achieved high accuracy up to 85%, sensitivity of 93% and specificity of 82%. This study demonstrates that a 3D genomic approach could be used to develop a predictive clinical assay for response to PD-(L)1 checkpoint inhibition in cancer patients.

show abstract

“…Trees (GBDT's) machine learning ensemble techniques [56]. All analysis was performed using R statistical language with Caret, XGBoost, SHAPforxgboost and CatBoost libraries.…”

Section: Discussionmentioning

confidence: 99%

Development and validation of blood-based predictive biomarkers for response to PD-(L)-1 checkpoint inhibitors: evidence of a universal systemic core of 3D immunogenetic profiling across multiple oncological indications

Hunter¹,

Dizfouli²,

Koutsothanasi³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…LightGBM is developed from the Gradient boosted decision trees (GBDT) model to create a better-performance model. The CatBoost model has emerged with the development of the Gradient Boosting model for high cardinality categorical variables [19].…”

Section: Methodsmentioning

confidence: 99%

Predictive Quality Defect Detection Using Machine Learning Algorithms: A Case Study from Automobile Industry

Yorulmuş

Bolat

Bahadır

2021

Lecture Notes in Networks and Systems

View full text Add to dashboard Cite

Industry 4.0 is generally defined as a development system that compels the digitalization of processes to create integrated and autonomous systems. The process tracking of parts is very important in terms of detecting missed faulty products. Some defects that escape from quality control directly affect the enduser. Machine learning algorithms have been used to predict changes in the quality control processes and defective products, toward real-time and effective data processing. Thus, the highest quality of the final product will be delivered to the customer and to reduce the defective production coming out of the manufacturing chain. In this article, the study aims to establish a predictive quality model that can detect defect-free approved but faulty products overlooked during the quality inspection operations. Machine learning methods are used to analyze the relationship between quality control data and customer complaints. For this purpose, we use the last quality stage data of an automobile manufacturer's brake system from 2018 to 2020. Machine learning models are constructed using logistic regression, ridge regression, support vector machine, random forest classification tree, gradient boost, XGBoost, LightGBM, and CatBoost algorithms. The results of specificity and negative prediction value show that the Gradient Boost and CatBoost algorithms have the best classification benefit for detecting the rare events.

show abstract

“…These rankings are followed by a report of which features are in each of the 4 Agree, 5 Agree, 6 Agree, and 7 Agree datasets (Tables 47,48,49,50,51,52,53,54,55,56,57,58). 6 , for CatBoost default hyperparameter values we refer the reader to the CatBoost documentation 7 , and for Light GBM default hyperparameter values, please consult their documentation 8 .…”

Section: Appendix Bmentioning

confidence: 99%

“…In all experiments, we employ the following eight learners: CatBoost [7], Light GBM [8], XGBoost [9], RF [10], DT [11], LR [12], NB [13], and a MLP [14]. To gauge the performance of these classifiers, the AUC and AUPRC metrics are used.…”

mentioning

confidence: 99%

IoT information theft prediction using ensemble feature selection

et al. 2022

Self Cite

View full text Add to dashboard Cite

The recent years have seen a proliferation of Internet of Things (IoT) devices and an associated security risk from an increasing volume of malicious traffic worldwide. For this reason, datasets such as Bot-IoT were created to train machine learning classifiers to identify attack traffic in IoT networks. In this study, we build predictive models with Bot-IoT to detect attacks represented by dataset instances from the Information Theft category, as well as dataset instances from the data exfiltration and keylogging subcategories. Our contribution is centered on the evaluation of ensemble feature selection techniques (FSTs) on classification performance for these specific attack instances. A group or ensemble of FSTs will often perform better than the best individual technique. The classifiers that we use are a diverse set of four ensemble learners (Light GBM, CatBoost, XGBoost, and random forest (RF)) and four non-ensemble learners (logistic regression (LR), decision tree (DT), Naive Bayes (NB), and a multi-layer perceptron (MLP)). The metrics used for evaluating classification performance are area under the receiver operating characteristic curve (AUC) and Area Under the precision-recall curve (AUPRC). For the most part, we determined that our ensemble FSTs do not affect classification performance but are beneficial because feature reduction eases computational burden and provides insight through improved data visualization.

show abstract

CatBoost for Big Data: an Interdisciplinary Review

Cited by 25 publications

References 54 publications

Development and validation of blood-based predictive biomarkers for response to PD-(L)-1 checkpoint inhibitors: evidence of a universal systemic core of 3D immunogenetic profiling across multiple oncological indications

Development and validation of blood-based predictive biomarkers for response to PD-(L)-1 checkpoint inhibitors: evidence of a universal systemic core of 3D immunogenetic profiling across multiple oncological indications

Predictive Quality Defect Detection Using Machine Learning Algorithms: A Case Study from Automobile Industry

IoT information theft prediction using ensemble feature selection

Contact Info

Product

Resources

About