Tianyuan Yao scite author profile

With the rapid development of self-supervised learning (e.g., contrastive learning), the importance of having large-scale images (even without annotations) for training a more generalizable AI model has been widely recognized in medical image analysis. However, collecting large-scale task-specific unannotated data at scale can be challenging for individual labs. Existing online resources, such as digital books, publications, and search engines, provide a new resource for obtaining large-scale images. However, published images in healthcare (e.g., radiology and pathology) consist of a considerable amount of compound figures with subplots. In order to extract and separate compound figures into usable individual images for downstream learning, we propose a simple compound figure separation (SimCFS) framework without using the traditionally required detection bounding box annotations, with a new loss function and a hard case simulation. Our technical contribution is four-fold: (1) we introduce a simulation-based training framework that minimizes the need for resource extensive bounding box annotations; (2) we propose a new side loss that is optimized for compound figure separation; (3) we propose an intra-class image augmentation method to simulate hard cases; and (4) to the best of our knowledge, this is the first study that evaluates the efficacy of leveraging self-supervised learning with compound image separation. From the results, the proposed SimCFS achieved state-of-the-art performance on the ImageCLEF 2016 Compound Figure Separation Database. The pretrained self-supervised learning model using large-scale mined figures improved the accuracy of downstream image classification tasks with a contrastive learning algorithm. The source code of SimCFS is made publicly available at https://github.com/hrlblab/ImageSeperation .

show abstract

Glo-In-One: holistic glomerular detection, segmentation, and lesion characterization with large-scale web image mining

Yao

Long

et al. 2022

J. Med. Imag.

View full text Add to dashboard Cite

Purpose: The quantitative detection, segmentation, and characterization of glomeruli from highresolution whole slide imaging (WSI) play essential roles in the computer-assisted diagnosis and scientific research in digital renal pathology. Historically, such comprehensive quantification requires extensive programming skills to be able to handle heterogeneous and customized computational tools. To bridge the gap of performing glomerular quantification for non-technical users, we develop the Glo-In-One toolkit to achieve holistic glomerular detection, segmentation, and characterization via a single line of command. Additionally, we release a large-scale collection of 30,000 unlabeled glomerular images to further facilitate the algorithmic development of self-supervised deep learning.Approach: The inputs of the Glo-In-One toolkit are WSIs, while the outputs are (1) WSIlevel multi-class circle glomerular detection results (which can be directly manipulated with ImageScope), (2) glomerular image patches with segmentation masks, and (3) different lesion types. In the current version, the fine-grained global glomerulosclerosis (GGS) characterization is provided, including assessed-solidified-GSS (associated with hypertension-related injury), disappearing-GSS (a further end result of the SGGS becoming contiguous with fibrotic interstitium), and obsolescent-GSS (nonspecific GGS increasing with aging) glomeruli. To leverage the performance of the Glo-In-One toolkit, we introduce self-supervised deep learning to glomerular quantification via large-scale web image mining. Results:The GGS fine-grained classification model achieved a decent performance compared with baseline supervised methods while only using 10% of the annotated data. The glomerular detection achieved an average precision of 0.627 with circle representations, while the glomerular segmentation achieved a 0.955 patch-wise Dice dimilarity coefficient. Conclusion:We develop and release an open-source Glo-In-One toolkit, a software with holistic glomerular detection, segmentation, and lesion characterization. This toolkit is user-friendly to non-technical users via a single line of command. The toolbox and the 30,000 web mined glomerular images have been made publicly available at https://github.com/hrlblab/Glo-In-One.

show abstract

An efficient EM algorithm for the mixture of negative binomial models

Huang

Liu

Yao

et al. 2019

J. Phys.: Conf. Ser.

View full text Add to dashboard Cite

Overdispersion is a widespread phenomenon in most count data sets. The negative binomial distribution is commonly adopted to fit over-dispersed count data. On the other hand, the mixture model always plays an important role in unsupervised classification. However, when estimating the parameters in the mixture of negative binomial models, the typical generalized Expectation Maximization (EM) algorithm which involves additional iterative procedures in M-step increases computational time. Hence, there remains a need for an efficient algorithm that can speed up the procedure of parameter estimation. For this purpose, here we develop a novel EM algorithm that successfully avoids the typical numerical solution in M-step for the mixture of negative binomial models. We extend further this EM algorithm to the zero-inflated negative binomial model. In the simulation studies, we focus on the runtimes and the classification performance of our proposed algorithm implemented in the mixture of negative binomial model. We found that our proposed EM algorithm can reduce the runtime of maximum likelihood estimation effectively, while achieving the similar classification performance in comparison with the typical EM algorithm. The mixture of negative binomial model and the proposed EM algorithm finally illustrates their good performance of fitting the real earthquake count data.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Tianyuan Yao

VoxelEmbed: 3D Instance Segmentation and Tracking with Voxel Embedding based Deep Learning

SimTriplet: Simple Triplet Representation Learning with a Single GPU

Compound Figure Separation of Biomedical Images with Side Loss

Glo-In-One: holistic glomerular detection, segmentation, and lesion characterization with large-scale web image mining

An efficient EM algorithm for the mixture of negative binomial models

Contact Info

Product

Resources

About