Marco Pedersoli scite author profile

We propose "Areas of Attention", a novel attentionbased model for automatic image captioning. Our approach models the dependencies between image regions, caption words, and the state of an RNN language model, using three pairwise interactions. In contrast to previous attentionbased approaches that associate image regions only to the RNN state, our method allows a direct association between caption words and image regions. During training these associations are inferred from image-level captions, akin to weakly-supervised object detector training. These associations help to improve captioning by localizing the corresponding regions during testing. We also propose and compare different ways of generating attention areas: CNN activation grids, object proposals, and spatial transformers nets applied in a convolutional fashion. Spatial transformers give the best results. They allow for image specific attention areas, and can be trained jointly with the rest of the network. Our attention mechanism and spatial transformer attention areas together yield state-of-the-art results on the MSCOCO dataset.

show abstract

A Unifying Mutual Information View of Metric Learning: Cross-Entropy vs. Pairwise Losses

Boudiaf¹,

Rony²,

Ziko³

et al. 2020

View full text Add to dashboard Cite

A coarse-to-fine approach for fast deformable object detection

Pedersoli

Vedaldi

Gonzàlez

et al. 2015

Pattern Recognition

View full text Add to dashboard Cite

DeepProposal: Hunting Objects by Cascading Deep Convolutional Layers

et al. 2015

View full text Add to dashboard Cite

In this paper we evaluate the quality of the activation layers of a convolutional neural network (CNN) for the generation of object proposals. We generate hypotheses in a sliding-window fashion over different activation layers and show that the final convolutional layers can find the object of interest with high recall but poor localization due to the coarseness of the feature maps. Instead, the first layers of the network can better localize the object of interest but with a reduced recall. Based on this observation we design a method for proposing object locations that is based on CNN features and that combines the best of both worlds. We build an inverse cascade that, going from the final to the initial convolutional layers of the CNN, selects the most promising object locations and refines their boxes in a coarse-to-fine manner. The method is efficient, because i) it uses the same features extracted for detection, ii) it aggregates features using integral images, and iii) it avoids a dense evaluation of the proposals due to the inverse coarse-to-fine cascade. The method is also accurate; it outperforms most of the previously proposed object proposals approaches and when plugged into a CNN-based detector produces state-of-theart detection performance.

show abstract

Deep co-training for semi-supervised image segmentation

Peng

Estrada

Pedersoli

et al. 2020

Pattern Recognition

148

View full text Add to dashboard Cite

In this paper, we aim to improve the performance of semantic image segmentation in a semi-supervised setting where training is performed with a reduced set of annotated images and additional non-annotated images. We present a method based on an ensemble of deep segmentation models. Models are trained on subsets of the annotated data and use non-annotated images to exchange information with each other, similar to co-training. Diversity across models is enforced with the use of adversarial samples. We demonstrate the potential of our method on two challenging image segmentation problems, and illustrate its ability to share information between simultaneously trained models, while preserving their diversity. Results indicate clear advantages in terms of performance compared to recently proposed semi-supervised methods for segmentation.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Marco Pedersoli

Face Detection without Bells and Whistles

Weakly supervised object detection with convex clustering

Weakly Supervised Detection with Posterior Regularization

Areas of Attention for Image Captioning

A Unifying Mutual Information View of Metric Learning: Cross-Entropy vs. Pairwise Losses

A coarse-to-fine approach for fast deformable object detection

DeepProposal: Hunting Objects by Cascading Deep Convolutional Layers

Deep co-training for semi-supervised image segmentation

Contact Info

Product

Resources

About