Gene expression data from microarrays are being applied to predict preclinical and clinical endpoints, but the reliability of these predictions has not been established. In the MAQC-II project, 36 independent teams analyzed six microarray data sets to generate predictive models for classifying a sample with respect to one of 13 endpoints indicative of lung or liver toxicity in rodents, or of breast cancer, multiple myeloma or neuroblastoma in humans. In total, >30,000 models were built using many combinations of analytical methods. The teams generated predictive models without knowing the biological meaning of some of the endpoints and, to mimic clinical reality, tested the models on data that had not been used for training. We found that model performance depended largely on the endpoint and team proficiency and that different approaches generated models of similar performance. The conclusions and recommendations from MAQC-II should be useful for regulatory agencies, study committees and independent investigators that evaluate methods for global gene expression analysis.
IntroductionAs part of the MicroArray Quality Control (MAQC)-II project, this analysis examines how the choice of univariate feature-selection methods and classification algorithms may influence the performance of genomic predictors under varying degrees of prediction difficulty represented by three clinically relevant endpoints.MethodsWe used gene-expression data from 230 breast cancers (grouped into training and independent validation sets), and we examined 40 predictors (five univariate feature-selection methods combined with eight different classifiers) for each of the three endpoints. Their classification performance was estimated on the training set by using two different resampling methods and compared with the accuracy observed in the independent validation set.ResultsA ranking of the three classification problems was obtained, and the performance of 120 models was estimated and assessed on an independent validation set. The bootstrapping estimates were closer to the validation performance than were the cross-validation estimates. The required sample size for each endpoint was estimated, and both gene-level and pathway-level analyses were performed on the obtained models.ConclusionsWe showed that genomic predictor accuracy is determined largely by an interplay between sample size and classification difficulty. Variations on univariate feature-selection methods and choice of classification algorithm have only a modest impact on predictor performance, and several statistically equally good predictors can be developed for any given classification problem.
Image quality can be objectively defined according to how well an observer can perform a task of practical interest given the image. We review a practical model observer for the signal-detection task. The ideal observer for this task is a function of the image probability distributions, which are multidimensional and complicated. This observer is often too difficult to derive or estimate. An alternative to the ideal observer is the ideal linear observer, which can still be unmanageable. Our alternative is the ideal linear observer constrained to a small set of channels: the channelized-Hotelling observer.
OSA Published byCurrent clinical practice is rapidly moving in the direction of volumetric imaging. For two-dimensional (2D) images, task-based medical image quality is often assessed using numerical model observers. For 3D images, however, these models have been little explored so far. In this work, first, two novel designs of a multi-slice channelized Hotelling observer (CHO) are proposed for the task of detecting 3D signals in 3D images. The novel designs are then compared and evaluated in a simulation study with five different CHO designs: a single-slice model, three multi-slice models and a volumetric model. Four different random background statistics are considered, both Gaussian (non-correlated and correlated Gaussian noise) and non-Gaussian (lumpy and clustered lumpy backgrounds). Overall, the results show that the volumetric model outperforms the others, while the disparity between the models decreases for greater complexity of the detection task. Among the multi-slice models, the second proposed CHO could most closely approach the volumetric model whereas the first new CHO seems to be least affected by the number of training samples.
Stromal tumor-infiltrating lymphocytes (sTILs) are important prognostic and predictive biomarkers in triple-negative (TNBC) and HER2-positive breast cancer. Incorporating sTILs into clinical practice necessitates reproducible assessment. Previously developed standardized scoring guidelines have been widely embraced by the clinical and research communities. We evaluated sources of variability in sTIL assessment by pathologists in three previous sTIL ring studies. We identify common challenges and evaluate impact of discrepancies on outcome estimates in early TNBC using a newly-developed prognostic tool. Discordant sTIL assessment is driven by heterogeneity in lymphocyte distribution. Additional factors include: technical slide-related issues; scoring outside the tumor boundary; tumors with minimal assessable stroma; including lymphocytes associated with other structures; and including other inflammatory cells. Small variations in sTIL assessment modestly alter risk estimation in early TNBC but have the potential to affect treatment selection if cutpoints are employed. Scoring and averaging multiple areas, as well as use of reference images, improve consistency of sTIL evaluation. Moreover, to assist in avoiding the pitfalls identified in this analysis, we developed an educational resource available at www.tilsinbreastcancer.org/pitfalls.
The extent of tumor-infiltrating lymphocytes (TILs), along with immunomodulatory ligands, tumor-mutational burden and other biomarkers, has been demonstrated to be a marker of response to immune-checkpoint therapy in several cancers. Pathologists have therefore started to devise standardized visual approaches to quantify TILs for therapy prediction. However, despite successful standardization efforts visual TIL estimation is slow, with limited precision and lacks the ability to evaluate more complex properties such as TIL distribution patterns. Therefore, computational image analysis approaches are needed to provide standardized and efficient TIL quantification. Here, we discuss different automated TIL scoring approaches ranging from classical image segmentation, where cell boundaries are identified and the resulting objects classified according to shape properties, to machine learning-based approaches that directly classify cells without segmentation but rely on large amounts of training data. In contrast to conventional machine learning (ML) approaches that are often criticized for their "black-box" characteristics, we also discuss explainable machine learning. Such approaches render ML results interpretable and explain the computational decision-making process through high-resolution heatmaps that highlight TILs and cancer cells and therefore allow for quantification and plausibility checks in biomedical research and diagnostics.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.