The Serotonin Transporter Gene Locus in Late-Life Major Depressive Disorder

Transfer learning, or domain adaptation, is concerned with machine learning problems in which training and testing data come from possibly different distributions (denoted as µ and µ , respectively). In this work, we give an informationtheoretic analysis on the generalization error and the excess risk of transfer learning algorithms, following a line of work initiated by Russo and Zhou. Our results suggest, perhaps as expected, that the Kullback-Leibler (KL) divergence D(µ||µ ) plays an important role in characterizing the generalization error in the settings of domain adaptation. Specifically, we provide generalization error upper bounds for general transfer learning algorithms, and extend the results to a specific empirical risk minimization (ERM) algorithm where data from both distributions are available in the training phase. We further apply the method to iterative, noisy gradient descent algorithms, and obtain upper bounds which can be easily calculated, only using parameters from the learning algorithms. A few illustrative examples are provided to demonstrate the usefulness of the results. In particular, our bound is tighter in specific classification problems than the bound derived using Rademacher complexity.

show abstract

Imputation techniques on missing values in breast cancer treatment and fertility data

Khorshidi

Aickelin

et al. 2019

Health Inf Sci Syst

View full text Add to dashboard Cite

Clinical decision support using data mining techniques offers more intelligent ways to reduce decision errors in the last few years. However, clinical datasets often suffer from high missingness, which adversely impacts the quality of modelling if handled improperly. Imputing missing values provides an opportunity to resolve the issue. Conventional imputation methods adopt simple statistical methods, such as mean imputation or discarding missing cases, which have many limitations and thus degrade the performance of learning. This study examines a series of machine learning based imputation methods and suggests an efficient approach for preparing a good quality breast cancer dataset, to find the relationship between breast cancer treatment and chemotherapy-related amenorrhoea, where the performance is evaluated by the accuracy of the prediction. To this end, the reliability and robustness of six well-known imputation methods are evaluated. Our results show that imputation leads to a significant boost in the classification performance compared to the model prediction based on list-wise deletion. Furthermore, the results reveal that most methods gain strong robustness and discriminant power even when the dataset experiences high missing rates (> 50%).

show abstract

Fast Rate Generalization Error Bounds: Variations on a Theme

Manton

Aickelin

et al. 2022

View full text Add to dashboard Cite

A Bayesian approach to (online) transfer learning: Theory and algorithms

Manton

Aickelin

et al. 2023

Artificial Intelligence

View full text Add to dashboard Cite

Imputation techniques on missing values in breast cancer treatment and fertility data

Khorshidi

Aickelin

et al. 2020

Preprint

View full text Add to dashboard Cite

Transfer Learning to Enhance Amenorrhea Status Prediction in Cancer and Fertility Data with Missing Values

Wu¹,

Khorshidi²,

Aickelin³

et al. 2020

View full text Add to dashboard Cite

On Causality in Domain Adaptation and Semi-Supervised Learning: an Information-Theoretic Analysis

Wu¹,

Gong²,

Manton³

et al. 2022

Preprint

View full text Add to dashboard Cite

The establishment of the link between causality and unsupervised domain adaptation (UDA)/semisupervised learning (SSL) has led to methodological advances in these learning problems in recent years. However, a formal theory that explains the role of causality in the generalization performance of UDA/SSL is still lacking. In this paper, we consider the UDA/SSL setting where we access m labeled source data and n unlabeled target data as training instances under a parametric probabilistic model. We study the learning performance (e.g., excess risk) of prediction in the target domain. Specifically, we distinguish two scenarios: the learning problem is called causal learning if the feature is the cause and the label is the effect, and is called anti-causal learning otherwise. We show that in causal learning, the excess risk depends on the size of the source sample at a rate of O( 1m ) only if the labelling distribution between the source and target domains remains unchanged. In anti-causal learning, we show that the unlabeled data dominate the performance at a rate of typically O( 1 n ). Our analysis is based on the notion of potential outcome random variables and information theory. These results bring out the relationship between the data sample size and the hardness of the learning problem with different causal mechanisms.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Xuetong Wu

Information-theoretic analysis for transfer learning

Information-theoretic analysis for transfer learning

Imputation techniques on missing values in breast cancer treatment and fertility data

Fast Rate Generalization Error Bounds: Variations on a Theme

A Bayesian approach to (online) transfer learning: Theory and algorithms

Imputation techniques on missing values in breast cancer treatment and fertility data

Transfer Learning to Enhance Amenorrhea Status Prediction in Cancer and Fertility Data with Missing Values

On Causality in Domain Adaptation and Semi-Supervised Learning: an Information-Theoretic Analysis

Contact Info

Product

Resources

About