Content-based histopathology image retrieval using CometCloud

Qi, Xin; Wang, Daihou; Rodero, Iván; Diaz-Montes, Javier; Gensure, Rebekah H.; Xing, Fuyong; Zhong, Hua; Goodell, Lauri; Parashar, Manish; Foran, David J.; Yang, Lin

doi:10.1186/1471-2105-15-287

Cited by 40 publications

(24 citation statements)

References 53 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Further in conjunction with smart analytics like content based image retrieval algorithms (Qi et al, 2014), students could be trained to identify and recognize pathology slides in a dynamic fashion.…”

Section: Introductionmentioning

confidence: 99%

Image analysis and machine learning in digital pathology: Challenges and opportunities

Madabhushi

Lee

2016

Medical Image Analysis

779

548

View full text Add to dashboard Cite

With the rise in whole slide scanner technology, large numbers of tissue slides are being scanned and represented and archived digitally. While digital pathology has substantial implications for telepathology, second opinions, and education there are also huge research opportunities in image computing with this new source of “big data”. It is well known that there is fundamental prognostic data embedded in pathology images. The ability to mine “sub-visual” image features from digital pathology slide images, features that may not be visually discernible by a pathologist, offers the opportunity for better quantitative modeling of disease appearance and hence possibly improved prediction of disease aggressiveness and patient outcome. However the compelling opportunities in precision medicine offered by big digital pathology data come with their own set of computational challenges. Image analysis and computer assisted detection and diagnosis tools previously developed in the context of radiographic images are woefully inadequate to deal with the data density in high resolution digitized whole slide images. Additionally there has been recent substantial interest in combining and fusing radiologic imaging and proteomics and genomics based measurements with features extracted from digital pathology images for better prognostic prediction of disease aggressiveness and patient outcome. Again there is a paucity of powerful tools for combining disease specific features that manifest across multiple different length scales. The purpose of this review is to discuss developments in computational image analysis tools for predictive modeling of digital pathology images from a detection, segmentation, feature extraction, and tissue classification perspective. We discuss the emergence of new handcrafted feature approaches for improved predictive modeling of tissue appearance and also review the emergence of deep learning schemes for both object detection and tissue classification. We also briefly review some of the state of the art in fusion of radiology and pathology images and also combining digital pathology derived image measurements with molecular “omics” features for better predictive modeling. The review ends with a brief discussion of some of the technical and computational challenges to be overcome and reflects on future opportunities for the quantitation of histopathology.

show abstract

Section: Introductionmentioning

confidence: 99%

Image analysis and machine learning in digital pathology: Challenges and opportunities

Madabhushi

Lee

2016

Medical Image Analysis

779

548

View full text Add to dashboard Cite

show abstract

“…Clustering patients according to hand-engineered features has been prior practice in histopathology CBIR, with multiple pathologists providing search relevancy annotations to tune the search algorithm [14]. Our approach relies on neither pathologists nor feature engineers, and instead learns discriminative genetic-histologic relationships in the dominant tumor to find similar patients.…”

Section: Spop Refseq Genesmentioning

confidence: 99%

H&E-stained Whole Slide Image Deep Learning Predicts SPOP Mutation State in Prostate Cancer

Schaumberg

Rubin

Fuchs

2016

Preprint

View full text Add to dashboard Cite

A quantitative model to genetically interpret the histology in whole microscopy slide images is desirable to guide downstream immunohistochemistry, genomics, and precision medicine. We constructed a statistical model that predicts whether or not SPOP is mutated in prostate cancer, given only the digital whole slide after standard hematoxylin and eosin [H&E] staining. Using a TCGA cohort of 177 prostate cancer patients where 20 had mutant SPOP, we trained multiple ensembles of residual networks, accurately distinguishing SPOP mutant from SPOP non-mutant patients (test AUROC=0.74, p=0.0007 Fisher's Exact Test). We further validated our full metaensemble classifier on an independent test cohort from MSK-IMPACT of 152 patients where 19 had mutant SPOP. Mutants and non-mutants were accurately distinguished despite TCGA slides being frozen sections and MSK-IMPACT slides being formalin-fixed paraffin-embedded sections (AUROC=0.86, p=0.0038). Moreover, we scanned an additional 36 MSK-IMPACT patient having mutant SPOP, trained on this expanded MSK-IMPACT cohort (test AUROC=0.75, p=0.0002), tested on the TCGA cohort (AUROC=0.64, p=0.0306), and again accurately distinguished mutants from non-mutants using the same pipeline. Importantly, our method demonstrates tractable deep learning in this "small data" setting of 20-55 positive examples and quantifies each prediction's uncertainty with confidence intervals. To our knowledge, this is the first statistical model to predict a genetic mutation in cancer directly from the patient's digitized H&E-stained whole microscopy slide. Moreover, this is the first time quantitative features learned from patient genetics and histology have been used for content-based image retrieval, finding similar patients for a given patient where the histology appears to share the same genetic driver of disease i.e. SPOP mutation (p=0.0241 Kost's Method), and finding similar patients for a given patient that does not have have that driver mutation (p=0.0170 Kost's Method).cancer | molecular pathology | deep learning | whole slide image G enetic drivers of cancer morphology, such as E-Cadherin [CDH1] loss promoting lobular rather than ductal phenotypes in breast, are well known. TMPRSS2-ERG fusion in prostate cancer has a number of known morphological traits, including blue-tinged mucin, cribriform pattern, and macronuclei [5]. Computational pathology methods [6] typically predict clinical or genetic features as a function of histological imagery, e.g. whole slide images. Our central hypothesis is that the morphology shown in these whole slide images, having nothing more than standard hematoxylin and eosin [H&E] staining, is a function of the underlying genetic drivers. To test this hypothesis, we gathered a cohort of 499 prostate adenocarcinoma patients from The Cancer Genome Atlas [TCGA] 1 , 177 of which were suitable for analysis, with 20 of those having mutant SPOP (Figs 1, 2, and S1). We then used ensembles of deep 1 TCGA data courtesy the TCGA Research Network http://cancergenome.nih.gov/ A....

show abstract

“…Qi et al [9] presents a cloud computing based parallel processing approach for content-based image retrieval in prostate cancer images. The WSIs are sub-divided into smaller size images during the course of a pre-processing step and transferred to a storage system of the agent node within the worker site.…”

Section: Introductionmentioning

confidence: 99%

Parallel Versus Distributed Data Access for Gigapixel-Resolution Histology Images: Challenges and Opportunities

Yildirim

Foran

2017

IEEE J. Biomed. Health Inform.

View full text Add to dashboard Cite

Recent advances in digital pathology technology have led to significant improvements in terms of both the quality and resolution of the resulting images which now often exceed several Gigabytes each. Today, several leading institutions across the country utilize whole-slide imaging (WSI) as part of their routine workflow. WSI’s have utility in a wide range of diagnostic and investigative pathology applications. The fact that, these images are both large in size (about 30GB when uncompressed), and are generated in non-standard proprietary formats has limited wider adoption of these technologies and makes the task of accessing, processing and analyzing them in high-throughput fashion extremely challenging. The common approach for such data analytics applications is to pre-process the large, whole-slide images into smaller size files and store them in a generic format. However this approach limits the advantages that might be realized if different scalability levels and data unit sizes could be dynamically changed based on the specifications of the task at hand and the architectural limits of the infrastructure (e.g. node memory size). Such strategies also introduce extra processing time to the workflow. To address these challenges we present, in this paper, novel scalable access methods for parallel file systems and distributed file/object storage systems. Experimental results gathered during the course of our studies show that these methods provide opportunities not realizable using traditional approaches. We demonstrate tangible, scalability and high-throughput advantages using a Lustre parallel file system and AWS S3 distributed storage system.

show abstract

Content-based histopathology image retrieval using CometCloud

Cited by 40 publications

References 53 publications

Image analysis and machine learning in digital pathology: Challenges and opportunities

Image analysis and machine learning in digital pathology: Challenges and opportunities

H&E-stained Whole Slide Image Deep Learning Predicts SPOP Mutation State in Prostate Cancer

Parallel Versus Distributed Data Access for Gigapixel-Resolution Histology Images: Challenges and Opportunities

Contact Info

Product

Resources

About