The outbreak of an acute respiratory syndrome (called novel coronavirus pneumonia, NCP) caused by SARS-CoV-2 virus has now progressed to a pandemic, and became the most common threat to public death worldwide[i] ,[ii]. COVID-19 screening using computed tomography (CT) can perform a quick diagnosis and identify high-risk NCP patients[iii]. Automated screening using CT volumes is a challenging task owing to inter-grader variability and high false-positive and false-negative rates. We propose a three dimensional (3D) deep learning convolutional neural networks (CNN) that use a patient's CT volume to predict the risk of COVID-19, trained end-to-end from CT volumes directly, using only images and disease labels as inputs. Our model achieves a state-of-the-art performance (95.78% overall accuracy, 99.4% area under the curve) on a dataset of 1,684 COVID-19 patients, nearly twice larger than previous datasets 3 , and performs similarly on an independent clinical validation set of 121 cases. We tested its performance against six radiologists on clinical con rmed patient' CT volumes, our model outperformed all six radiologists with absolute reductions of 7% in false positives and 35.9% in false negatives, demonstrating arti cial intelligence (AI) capable to optimize the COVID-19 screening process via computer assistance and automation with a level of competence comparable to radiologists. While the vast majority of patients remain unscreened, we show the potential for AI to increase the accuracy and consistency of COVID-19 screening with CT.
Domain adaptation solves a learning problem in a target domain by utilizing the training data in a different but related source domain. Intuitively, discovering a good feature representation across domains is crucial. In this paper, we propose to find such a representation through a new learning method, transfer component analysis (TCA), for domain adaptation. TCA tries to learn some transfer components across domains in a Reproducing Kernel Hilbert Space (RKHS) using Maximum Mean Discrepancy (MMD). In the subspace spanned by these transfer components, data distributions in different domains are close to each other. As a result, with the new representations in this subspace, we can apply standard machine learning methods to train classifiers or regression models in the source domain for use in the target domain. The main contribution of our work is that we propose a novel feature representation in which to perform domain adaptation via a new parametric kernel using feature extraction methods, which can dramatically minimize the distance between domain distributions by projecting data onto the learned transfer components. Furthermore, our approach can handle large datsets and naturally lead to out-of-sample generalization. The effectiveness and efficiency of our approach in are verified by experiments on two real-world applications: cross-domain indoor WiFi localization and cross-domain text classification.
Many software defect prediction approaches have been proposed and most are effective in within-project prediction settings. However, for new projects or projects with limited training data, it is desirable to learn a prediction model by using sufficient training data from existing source projects and then apply the model to some target projects (cross-project defect prediction). Unfortunately, the performance of crossproject defect prediction is generally poor, largely because of feature distribution differences between the source and target projects.In this paper, we apply a state-of-the-art transfer learning approach, TCA, to make feature distributions in source and target projects similar. In addition, we propose a novel transfer defect learning approach, TCA+, by extending TCA. Our experimental results for eight open-source projects show that TCA+ significantly improves cross-project prediction performance.Index Terms-cross-project defect prediction, transfer learning, empirical software engineering I. INTRODUCTION Recently, numerous effective software defect prediction approaches have been proposed and received a tremendous amount of attention [1], [2], [3], [4], [5]. Most approaches employ machine learning classifiers to build a prediction model from data sets mined from software repositories, and the model is used to identify software defects. However, most approaches are evaluated in within-project settings, i.e., a prediction model is built from a part of a project and the model is evaluated with the remainder of the project by 10-fold cross validation [2], [6], [7], [8] and/or random instance splits [5], [9], [10].In practice, cross-project defect prediction is necessary. New projects often do not have enough defect data to build a prediction model. This cold-start is a well-known problem for recommender systems [11] and can be addressed by using cross-project defect prediction to build a prediction model using data from other projects. The model is then applied to new projects.However, cross-project defect prediction often yields poor performance. Zimmermann et al.[12] evaluated cross-project defect prediction performance based on data from 12 projects (622 combinations). They found that only 21 pairs yielded reasonable prediction performance.One of the main reasons for the poor cross-project prediction performance is the difference between the data distributions of source and target projects. Most machine learning classifiers are designed under the assumption that training and
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.