This work is motivated by the needs of predictive analytics on healthcare data as represented by Electronic Medical Records. Such data is invariably problematic: noisy, with missing entries, with imbalance in classes of interests, leading to serious bias in predictive modeling. Since standard data mining methods often produce poor performance measures, we argue for development of specialized techniques of data-preprocessing and classification. In this paper, we propose a new method to simultaneously classify large datasets and reduce the effects of missing values. It is based on a multilevel framework of the cost-sensitive SVM and the expected maximization imputation method for missing values, which relies on iterated regression analyses. We compare classification results of multilevel SVM-based algorithms on public benchmark datasets with imbalanced classes and missing values as well as real data in health applications, and show that our multilevel SVM-based method produces fast, and more accurate and robust classification results.
We discuss the choice of polynomial basis for approximation of uncertainty propagation through complex simulation models with capability to output derivative information. Our work is part of a larger research effort in uncertainty quantification using sampling methods augmented with derivative information. The approach has new challenges compared with standard polynomial regression. In particular, we show that a tensor product multivariate orthogonal polynomial basis of an arbitrary degree may no longer be constructed. We provide sufficient conditions for an orthonormal set of this type to exist, a basis for the space it spans. We demonstrate the benefits of the basis in the propagation of material uncertainties through a simplified model of heat transport in a nuclear reactor core. Compared with the tensor product Hermite polynomial basis, the orthogonal basis results in a better numerical conditioning of the regression procedure, a modest improvement in approximation error when basis polynomials are chosen a priori, and a significant improvement when basis polynomials are chosen adaptively, using a stepwise fitting procedure.
The Laboratory's main facility is outside Chicago, at 9700 South Cass Avenue, Argonne, Illinois 60439. For information about Argonne and its pioneering science and technology programs, see www.anl.gov.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.