On learning to localize objects with minimal supervision

Song, Hyun Oh; Girshick, Ross; Jegelka, Stefanie; Mairal, Julien; Harchaoui, Zaïd; Darrell, Trevor

doi:10.48550/arxiv.1403.1024

Cited by 18 publications

(26 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, the loss function of MIL is non-convex, and the optimization of MIL is sensitive to initialization [7,5,1,24]. In order to solve this issue, some works introduce better initialization methods.…”

Section: Related Workmentioning

confidence: 99%

“…Bilen et al [1] introduce a smoothed version of MIL that softly labels object instances. Song et al [24] propose to use Nesterov's smoothing technique in latent SVM model. The proposed method is also related to the non-convexity of MIL, but we propose to utilize the instability, which is partly caused by the non-convexity.…”

Section: Related Workmentioning

confidence: 99%

“…Weakly supervised object detection (WSOD) has attracted intensive attention recently [24,1,5,2,26,33,31,29,12]. Unlike fully supervised object detection, WSOD aims at training detectors with only image-level annotations, which cost much less human labor than bounding boxes annotations.…”

Section: Introductionmentioning

confidence: 99%

“…Training images are treated as labeled bags, which consist of multiple candidate bounding boxes. The learning procedure alternates between selecting the most confident proposals and using them to train a detector [5,24,16]. Recently, many works combine convolutional neural networks (CNN) with MIL and get promising results [2,26,25,27,33,30,31,9,29,32,14,12].…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Utilizing the Instability in Weakly Supervised Object Detection

Gao

Liu

Guo

et al. 2019

Preprint

View full text Add to dashboard Cite

Weakly supervised object detection (WSOD) focuses on training object detector with only image-level annotations, and is challenging due to the gap between the supervision and the objective. Most of existing approaches model WSOD as a multiple instance learning (MIL) problem. However, we observe that the result of MIL based detector is unstable, i.e., the most confident bounding boxes change significantly when using different initializations. We quantitatively demonstrate the instability by introducing a metric to measure it, and empirically analyze the reason of instability. Although the instability seems harmful for detection task, we argue that it can be utilized to improve the performance by fusing the results of differently initialized detectors. To implement this idea, we propose an end-to-end framework with multiple detection branches, and introduce a simple fusion strategy. We further propose an orthogonal initialization method to increase the difference between detection branches. By utilizing the instability, we achieve 52.6% and 48.0% mAP on the challenging PASCAL VOC 2007 and 2012 datasets, which are both the new state-ofthe-arts.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Utilizing the Instability in Weakly Supervised Object Detection

Gao

Liu

Guo

et al. 2019

Preprint

View full text Add to dashboard Cite

show abstract

“…However, these methods require bounding-box labels. In contrast, several methods exist that use weaker supervision to identify object locations [49,50,26,27]. Close to our work is LCFCN [25] which uses point-level annotations in order to obtain the locations and counts of the objects of interest.…”

Section: Related Workmentioning

confidence: 99%

Instance Segmentation with Point Supervision

Laradji¹,

Rostamzadeh²,

Pinheiro³

et al. 2019

Preprint

View full text Add to dashboard Cite

Instance segmentation methods often require costly perpixel labels. We propose a method that only requires pointlevel annotations. During training, the model only has access to a single pixel label per object, yet the task is to output full segmentation masks. To address this challenge, we construct a network with two branches: (1) a localization network (L-Net) that predicts the location of each object; and (2) an embedding network (E-Net) that learns an embedding space where pixels of the same object are close. The segmentation masks for the located objects are obtained by grouping pixels with similar embeddings. At training time, while L-Net only requires point-level annotations, E-Net uses pseudo-labels generated by a classagnostic object proposal method. We evaluate our approach on PASCAL VOC, COCO, KITTI and CityScapes datasets. The experiments show that our method (1) obtains competitive results compared to fully-supervised methods in certain scenarios; (2) outperforms fully-and weakly-supervised methods with a fixed annotation budget; and (3) is a first strong baseline for instance segmentation with point-level supervision.

show abstract

InfoMask: Masked Variational Latent Representation to Localize Chest Disease

Taghanaki

Havaei²,

Berthier³

et al. 2019

Lecture Notes in Computer Science

View full text Add to dashboard Cite

The scarcity of richly annotated medical images is limiting supervised deep learning based solutions to medical image analysis tasks, such as localizing discriminatory radiomic disease signatures. Therefore, it is desirable to leverage unsupervised and weakly supervised models. Most recent weakly supervised localization methods apply attention maps or region proposals in a multiple instance learning formulation. While attention maps can be noisy, leading to erroneously highlighted regions, it is not simple to decide on an optimal window/bag size for multiple instance learning approaches. In this paper, we propose a learned spatial masking mechanism to filter out irrelevant background signals from attention maps. The proposed method minimizes mutual information between a masked variational representation and the input while maximizing the information between the masked representation and class labels. This results in more accurate localization of discriminatory regions. We tested the proposed model on the ChestX-ray8 dataset to localize pneumonia from chest X-ray images without using any pixellevel or bounding-box annotations.

show abstract

On learning to localize objects with minimal supervision

Cited by 18 publications

References 0 publications

Utilizing the Instability in Weakly Supervised Object Detection

Utilizing the Instability in Weakly Supervised Object Detection

Instance Segmentation with Point Supervision

InfoMask: Masked Variational Latent Representation to Localize Chest Disease

Contact Info

Product

Resources

About