Boosting Tree-Assisted Multitask Deep Learning for Small Scientific Datasets

Jian, Jiang; Wang, Rui; Wang, Menglun; Gao, Kaifu; Nguyen, Duc Duy; Wei, Guo-Wei

doi:10.1021/acs.jcim.9b01184

Cited by 79 publications

(67 citation statements)

References 73 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The overfitting issue poses a challenge to traditional machine learning methods if a large number of descriptors is used. MT-DNN is a method to extract information from data sets that share certain statistical distributions, which can effectively improve the predictive ability of models on small data sets [2,13]. Based on the AGBT framework, we fuse AG-FPs and BT s -FPs, i.e., BT-FPs with a supervised fine-tuning procedure for task-specific data.…”

Section: Resultsmentioning

confidence: 99%

Algebraic Graph-assisted Bidirectional Transformers for Molecular Prediction

Chen

Gao

Nguyen

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

The ability of quantitative molecular prediction is of great significance to drug discovery, human health, and environmental protection. Despite considerable efforts, quantitative prediction of various molecular properties remains a challenge. Although some machine learning models, such as bidirectional encoder from transformer, can incorporate massive unlabeled molecular data into molecular representations via a self-supervised learning strategy, it neglects three-dimensional (3D) stereochemical information. Algebraic graph, specifically, element-specific multiscale weighted colored algebraic graph, embeds complementary 3D molecular information into graph invariants. We propose an algebraic graph-assisted bidirectional transformer (AGBT) model by fusing representations generated by algebraic graph and bidirectional transformer, as well as a variety of machine learning algorithms, including decision trees, multitask learning, and deep neural networks. We validate the proposed AGBT model on five benchmark molecular datasets, involving quantitative toxicity and partition coefficient. Extensive numerical experiments suggest that AGBT outperforms all other existing methods for all these molecular predictions.

show abstract

Section: Resultsmentioning

confidence: 99%

Algebraic Graph-assisted Bidirectional Transformers for Molecular Prediction

Chen

Gao

Nguyen

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…This gets to a deeper challenge in machine learning, going beyond the scope of this paper --statistical power analysis (see discussion in Slater & Baker, 2018). The trend in machine learning over the last few decades has largely been to consider ever-larger data sets rather than minimum data set sizes needed (Jiang et al, 2020). While not discounting the "unreasonable effectiveness of big data" (Halevy et al, 2009), we note that it is still necessary to determine how many learners of a specific group need to be in a training set (or a separate model's training set) before the model can generally be expected to be reliable for that group.…”

Section: Summary and Discussionmentioning

confidence: 99%

Algorithmic Bias in Education

Baker¹,

Hawn²

2021

Preprint

View full text Add to dashboard Cite

Draft Preprint. In this paper, we review algorithmic bias in education, discussing the causes of that bias and reviewing the empirical literature on the specific ways that algorithmic bias is known to have manifested in education. While other recent work has reviewed mathematical definitions of fairness and expanded algorithmic approaches to reducing bias, our review focuses instead on solidifying the current understanding of the concrete impacts of algorithmic bias in education—which groups are known to be impacted and which stages and agents in the development and deployment of educational algorithms are implicated. We discuss theoretical and formal perspectives on algorithmic bias, connect those perspectives to the machine learning pipeline, and review metrics for assessing bias. Next, we review the evidence around algorithmic bias in education, beginning with the most heavily-studied categories of race/ethnicity, gender, and nationality, and moving to the available evidence of bias for less-studied categories, such as socioeconomic status, disability, and military-connected status. Acknowledging the gaps in what has been studied, we propose a framework for moving from unknown bias to known bias and from fairness to equity. We discuss obstacles to addressing these challenges and propose four areas of effort for mitigating and resolving the problems of algorithmic bias in AIED systems and other educational technology.

show abstract

“…On the other hand, well-established neural network techniques have emerged in several fields including the one for cardiovascular outcome predictions, often providing promising results, with respect to other more classical machine learning techniques, when large datasets are involved [ 8 , 9 , 10 ]. Generally, decision trees are less data demanding and GBDTs techniques are typically optimal for small datasets, whereas neural networks usually perform better on large datasets [ 11 ]. In other words, decision trees can allow the model to reach optimal convergence without requiring those large datasets which are necessary for neural networks.…”

Section: Methodsmentioning

confidence: 99%

Decision Trees for Predicting Mortality in Transcatheter Aortic Valve Implantation

et al. 2021

View full text Add to dashboard Cite

Current prognostic risk scores in cardiac surgery do not benefit yet from machine learning (ML). This research aims to create a machine learning model to predict one-year mortality of a patient after transcatheter aortic valve implantation (TAVI). We adopt a modern gradient boosting on decision trees classifier (GBDTs), specifically designed for categorical features. In combination with a recent technique for model interpretations, we developed a feature analysis and selection stage, enabling the identification of the most important features for the prediction. We base our prediction model on the most relevant features, after interpreting and discussing the feature analysis results with clinical experts. We validated our model on 270 consecutive TAVI cases, reaching a C-statistic of 0.83 with CI [0.82, 0.84]. The model has achieved a positive predictive value ranging from 57% to 64%, suggesting that the patient selection made by the heart team of professionals can be further improved by taking into consideration the clinical data we identified as important and by exploiting ML approaches in the development of clinical risk scores. Our approach has shown promising predictive potential also with respect to widespread prognostic risk scores, such as logistic European system for cardiac operative risk evaluation (EuroSCORE II) and the society of thoracic surgeons (STS) risk score, which are broadly adopted by cardiologists worldwide.

show abstract

Boosting Tree-Assisted Multitask Deep Learning for Small Scientific Datasets

Abstract: Machine learning approaches have had tremendous success in various disciplines. However, such success highly depends on the size and quality of datasets. Scientific datasets are often small and difficult to collect. Currently, improving machine learning performance for small scientific

Cited by 79 publications

References 73 publications

Algebraic Graph-assisted Bidirectional Transformers for Molecular Prediction

Algebraic Graph-assisted Bidirectional Transformers for Molecular Prediction

Algorithmic Bias in Education

Decision Trees for Predicting Mortality in Transcatheter Aortic Valve Implantation

Contact Info

Product

Resources

About