Weakly Supervised Object Localization (WSOL) techniques learn the object location only using image-level labels, without location annotations. A common limitation for these techniques is that they cover only the most discriminative part of the object, not the entire object. To address this problem, we propose an Attention-based Dropout Layer (ADL), which utilizes the self-attention mechanism to process the feature maps of the model. The proposed method is composed of two key components: 1) hiding the most discriminative part from the model for capturing the integral extent of object, and 2) highlighting the informative region for improving the recognition power of the model. Based on extensive experiments, we demonstrate that the proposed method is effective to improve the accuracy of WSOL, achieving a new state-of-the-art localization accuracy in CUB-200-2011 dataset. We also show that the proposed method is much more efficient in terms of both parameter and computation overheads than existing techniques.
Purpose Convolutional neural network (CNN)‐based image denoising techniques have shown promising results in low‐dose CT denoising. However, CNN often introduces blurring in denoised images when trained with a widely used pixel‐level loss function. Perceptual loss and adversarial loss have been proposed recently to further improve the image denoising performance. In this paper, we investigate the effect of different loss functions on image denoising performance using task‐based image quality assessment methods for various signals and dose levels. Methods We used a modified version of U‐net that was effective at reducing the correlated noise in CT images. The loss functions used for comparison were two pixel‐level losses (i.e., the mean‐squared error and the mean absolute error), Visual Geometry Group network‐based perceptual loss (VGG loss), adversarial loss used to train the Wasserstein generative adversarial network with gradient penalty (WGAN‐GP), and their weighted summation. Each image denoising method was applied to reconstructed images and sinogram images independently and validated using the extended cardiac‐torso (XCAT) simulation and Mayo Clinic datasets. In the XCAT simulation, we generated fan‐beam CT datasets with four different dose levels (25%, 50%, 75%, and 100% of a normal‐dose level) using 10 XCAT phantoms and inserted signals in a test set. The signals had two different shapes (spherical and spiculated), sizes (4 and 12 mm), and contrast levels (60 and 160 HU). To evaluate signal detectability, we used a detection task SNR (tSNR) calculated from a non‐prewhitening model observer with an eye filter. We also measured the noise power spectrum (NPS) and modulation transfer function (MTF) to compare the noise and signal transfer properties. Results Compared to CNNs without VGG loss, VGG‐loss‐based CNNs achieved a more similar tSNR to that of the normal‐dose CT for all signals at different dose levels except for a small signal at the 25% dose level. For a low‐contrast signal at 25% or 50% dose, adding other losses to the VGG loss showed more improved performance than only using VGG loss. The NPS shapes from VGG‐loss‐based CNN closely matched that of normal‐dose CT images while CNN without VGG loss overly reduced the mid‐high‐frequency noise power at all dose levels. MTF also showed VGG‐loss‐based CNN with better‐preserved high resolution for all dose and contrast levels. It is also observed that additional WGAN‐GP loss helps improve the noise and signal transfer properties of VGG‐loss‐based CNN. Conclusions The evaluation results using tSNR, NPS, and MTF indicate that VGG‐loss‐based CNNs are more effective than those without VGG loss for natural denoising of low‐dose images and WGAN‐GP loss improves the denoising performance of VGG‐loss‐based CNNs, which corresponds with the qualitative evaluation.
Knowledge distillation (KD) is a well-known method to reduce inference latency by compressing a cumbersome teacher model to a small student model. Despite the success of KD in the classification task, applying KD to recommender models is challenging due to the sparsity of positive feedback, the ambiguity of missing feedback, and the ranking problem associated with the top-N recommendation. To address the issues, we propose a new KD model for the collaborative filtering approach, namely collaborative distillation (CD). Specifically, (1) we reformulate a loss function to deal with the ambiguity of missing feedback.(2) We exploit probabilistic rank-aware sampling for the top-N recommendation. (3) To train the proposed model effectively, we develop two training strategies for the student model, called the teacher-and the student-guided training methods, selecting the most useful feedback from the teacher model. Via experimental results, we demonstrate that the proposed model outperforms the state-of-the-art method by 2.7-33.2% and 2.7-29.1% in hit rate (HR) and normalized discounted cumulative gain (NDCG), respectively. Moreover, the proposed model achieves the performance comparable to the teacher model.
Existing co-localization techniques significantly lose performance over weakly or fully supervised methods in accuracy and inference time. In this paper, we overcome common drawbacks of co-localization techniques by utilizing self-supervised learning approach. The major technical contributions of the proposed method are two-fold. 1) We devise a new geometric transformation, namely point symmetric transformation and utilize its parameters as an artificial label for self-supervised learning. This new transformation can also play the role of region-drop based regularization. 2) We suggest a heat map extraction method for computing the heat map from the network trained by self-supervision, namely class-agnostic activation mapping. It is done by computing the spatial attention map. Based on extensive evaluations, we observe that the proposed method records new state-of-the-art performance in three fine-grained datasets for unsupervised object localization. Moreover, we show that the idea of the proposed method can be adopted in a modified manner to solve the weakly supervised object localization task. As a result, we outperform the current state-of-the-art technique in weakly supervised object localization by a significant gap.
Purpose: In this paper, we propose a convolutional neural network (CNN)-based efficient model observer for breast computed tomography (CT) images. Methods: We first showed that the CNN-based model observer provided similar detection performance to the ideal observer (IO) for signal-known-exactly and background-known-exactly detection tasks with an uncorrelated Gaussian background noise image. We then demonstrated that a singlelayer CNN without a nonlinear activation function provided similar detection performance in breast CT images to the Hotelling observer (HO). To train the CNN-based model observer, we generated simulated breast CT images to produce a training dataset in which different background noise structures were generated using filtered back projection with a ramp, or a Hanning weighted ramp, filter. Circular, elliptical, and spiculated signals were used for the detection tasks. The optimal depth and the number of channels for the CNN-based model observer were determined for each task. The detection performances of the HO and a channelized Hotelling observer (CHO) with Laguerre-Gauss (LG) and partial least squares (PLS) channels were also estimated for comparison. Results: The results showed that the CNN-based model observer provided higher detection performance than the HO, LG-CHO, and PLS-CHO for all tasks. In addition, it was shown that the proposed CNN-based model observer provided higher detection performance than the HO using a smaller training dataset. Conclusions: In the presence of nonlinearity in the CNN, the proposed CNN-based model observer showed better performance than other linear observers.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.