Unsupervised stratification of cross-validation for accuracy estimation

Diamantidis, N. A.; Karlis, Dimitris; Giakoumakis, E. A.

doi:10.1016/s0004-3702(99)00094-6

Cited by 149 publications

(78 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For most of the listed techniques, this procedure occurs several times on each of these training-validation set pairs. A common approach when partitioning the data into training and validation set is the use of stratification [246]. In stratified validation, the sets have the same fraction of labels as the data of origin.…”

Section: Machine Learning Performance Evaluationmentioning

confidence: 99%

Personalizing Medicine Through Hybrid Imaging and Medical Big Data Analysis

et al. 2018

View full text Add to dashboard Cite

Medical imaging has evolved from a pure visualization tool to representing a primary source of analytic approaches toward in vivo disease characterization. Hybrid imaging is an integral part of this approach, as it provides complementary visual and quantitative information in the form of morphological and functional insights into the living body. As such, non-invasive imaging modalities no longer provide images only, but data, as stated recently by pioneers in the field. Today, such information, together with other, non-imaging medical data creates highly heterogeneous data sets that underpin the concept of medical big data. While the exponential growth of medical big data challenges their processing, they inherently contain information that benefits a patient-centric personalized healthcare. Novel machine learning approaches combined with high-performance distributed cloud computing technologies help explore medical big data. Such exploration and subsequent generation of knowledge require a profound understanding of the technical challenges. These challenges increase in complexity when employing hybrid, aka dual-or even multi-modality image data as input to big data repositories. This paper provides a general insight into medical big data analysis in light of the use of hybrid imaging information. First, hybrid imaging is introduced (see further contributions to this special Research Topic), also in the context of medical big data, then the technological background of machine learning as well as state-of-the-art distributed cloud computing technologies are presented, followed by the discussion of data preservation and data sharing trends. Joint data exploration endeavors in the context of in vivo radiomics and hybrid imaging will be presented. Standardization challenges of imaging protocol, delineation, feature engineering, and machine learning evaluation will be detailed. Last, the paper will provide an outlook into the future role of hybrid imaging in view of personalized medicine, whereby a focus will be given to the derivation of prediction models as part of clinical decision support systems, to which machine learning approaches and hybrid imaging can be anchored.

show abstract

Section: Machine Learning Performance Evaluationmentioning

confidence: 99%

Personalizing Medicine Through Hybrid Imaging and Medical Big Data Analysis

et al. 2018

View full text Add to dashboard Cite

show abstract

“…However, results of this procedure can be conceived as indicators of a relative performance or otherwise as an optimistic estimate of the hydrological members' selection process (Diamantidis et al, 2000). Figure 2 shows the generalization or test methodology of the hydrological members' selection at two levels: the local focuses on the extrapolation of results to different FTH within the same catchment and another named regional, while the regional level tests the temporal and spatial performance in nearby catchments, or under a broader perspective on the integration of regional results.…”

Section: Generalization Test Methodologymentioning

confidence: 99%

Simplifying a hydrological ensemble prediction system with a backward greedy selection of members – Part 2: Generalization in time and space

Brochero

Anctil²,

Gagné

2011

Hydrol. Earth Syst. Sci.

View full text Add to dashboard Cite

Abstract. An uncertainty cascade model applied to stream flow forecasting seeks to evaluate the different sources of uncertainty of the complex rainfall-runoff process. The current trend focuses on the combination of Meteorological Ensemble Prediction Systems (MEPS) and hydrological model(s). However, the number of members of such a HEPS may rapidly increase to a level that may not be operationally sustainable. This paper evaluates the generalization ability of a simplification scheme of a 800-member HEPS formed by the combination of 16 lumped rainfall-runoff models with the 50 perturbed members from the European Centre for Mediumrange Weather Forecasts (ECMWF) EPS. Tests are made at two levels. At the local level, the transferability of the 9th day hydrological member selection for the other 8 forecast horizons exhibits an 82 % success rate. The other evaluation is made at the regional or cluster level, the transferability from one catchment to another from within a cluster of watersheds also leads to a good performance (85 % success rate), especially for forecast time horizons above 3 days and when the basins that formed the cluster presented themselves a good performance on an individual basis. Diversity, defined as hydrological model complementarity addressing different aspects of a forecast, was identified as the critical factor for proper selection applications.

show abstract

“…The cross-validation process is then repeated k times; each one of the k sub-samples is used exactly once as the validation data. The average of the k results from the k-folds gives the KCV test accuracy of the algorithm [30]- [31] . Our k-fold crossvalidation is a 10-fold cross-validation.…”

Section: Performance Metricsmentioning

confidence: 99%