Valter Estevam scite author profile

Recently, several approaches have explored the detection and classification of objects in videos to perform Zero-Shot Action Recognition (ZSAR) with remarkable results. In these methods, class-object relationships are used to associate visual patterns with the semantic side information because these relationships also tend to appear in texts. Therefore, word vector methods would reflect them in their latent representations. Inspired by these methods and by video captioning's ability to describe events not only with a set of objects but with contextual information, we propose a method in which video captioning models, called observers, provide different and complementary descriptive sentences. We demonstrate that representing videos with descriptive sentences instead of deep features, in ZSAR, is viable and naturally alleviates the domain adaptation problem, as we reached state-of-the-art (SOTA) performance on the UCF101 dataset and competitive performance on HMDB51 without their training sets. We also demonstrate that word vectors are unsuitable for building the semantic embedding space of our descriptions. Thus, we propose to represent the classes with sentences extracted from documents acquired with search engines on the Internet, without any human evaluation on the quality of descriptions. Lastly, we build a shared semantic space employing BERT-based embedders pre-trained in the paraphrasing task on multiple text datasets. We show that this pre-training is essential for bridging the semantic gap. The projection onto this space is straightforward for both types of information, visual and semantic, because they are sentences, enabling the classification with nearest neighbour rule in this shared space. Our code is available at https://github.com/valterlej/zsarcap.

show abstract

On the Cross-dataset Generalization in License Plate Recognition

Laroca

Cardoso

Lucio

et al. 2022

View full text Add to dashboard Cite

Zero-shot action recognition in videos: A survey

2021

View full text Add to dashboard Cite

Dense Video Captioning Using Unsupervised Semantic Information

Estevam¹,

Laroca²,

Pedrini³

et al. 2021

Preprint

View full text Add to dashboard Cite

We introduce a method to learn unsupervised semantic visual information based on the premise that complex events (e.g., minutes) can be decomposed into simpler events (e.g., a few seconds), and that these simple events are shared across several complex events. We split a long video into short frame sequences to extract their latent representation with three-dimensional convolutional neural networks. A clustering method is used to group representations producing a visual codebook (i.e., a long video is represented by a sequence of integers given by the cluster labels). A dense representation is learned by encoding the co-occurrence probability matrix for the codebook entries. We demonstrate how this representation can leverage the performance of the dense video captioning task in a scenario with only visual features. As a result of this approach, we are able to replace the audio signal in the Bi-Modal Transformer (BMT) method and produce temporal proposals with comparable performance. Furthermore, we concatenate the visual signal with our descriptor in a vanilla transformer method to achieve state-of-the-art performance in captioning compared to the methods that explore only visual features, as well as a competitive performance with multi-modal methods. Our code is available at https://github.com/valterlej/dvcusi.

show abstract

HPV e câncer de cabeça e pescoço: desenvolvimento de um aplicativo para adolescentes

et al. 2019

View full text Add to dashboard Cite

A adolescência é uma fase marcada por grandes transformações, que envolvem aspectos biológicos, psíquicos, sociais e culturais. Nesse sentido, o adolescente apresenta vulnerabilidades para a transmissão das doenças sexualmente transmissíveis, entre elas o HPV. Constata-se o aumento da prevalência do câncer de cabeça e pescoço em adultos jovens, associada ao HPV. Assim, o objetivo da pesquisa foi o desenvolvimento de um aplicativo para smartphones, voltado à adolescentes, como estratégia de prevenção do câncer de cabeça e pescoço causado pelo HPV. Para tal realizou-se oficinas participativas com 90 adolescentes de ensino médio, com idade entre 16 e 18 anos, sobre a relação entre câncer de cabeça e pescoço e HPV. O conteúdo do aplicativo foi elaborado segundo temas levantados pelos alunos. O conteúdo do aplicativo foi validado por expertises, com concordância maior que 85%. A validação de aparência foi realizada pelos próprios adolescentes, com o mesmo nível de concordância.

show abstract

A First Look at Dataset Bias in License Plate Recognition

Laroca

Santos

Estevam

et al. 2022

View full text Add to dashboard Cite

Do We Train on Test Data? The Impact of Near-Duplicates on License Plate Recognition

Laroca

Estevam

Britto

et al. 2023

View full text Add to dashboard Cite

Probabilistic logic reasoning for subjective interestingness analysis

Rocha¹,

Guimarães²,

Estevam³

2019

RBCA

View full text Add to dashboard Cite

This paper presents an approach that uses probabilistic logic reasoning to compute subjective interestingness scores for classification rules. In the proposed approach, domain knowledge is represented as a probabilistic logic program that encodes information from experts and statistical reports. The computation of interestingness scores is performed by a procedure that applies linear programming to reasoning regarding the probabilities of interest. It provides a mechanism to calculate probability-based subjective interestingness scores. Further, a sample application illustrates the use of the described approach.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Valter Estevam

Tell me what you see: A zero-shot action recognition method based on natural language descriptions

On the Cross-dataset Generalization in License Plate Recognition

Zero-shot action recognition in videos: A survey

Dense Video Captioning Using Unsupervised Semantic Information

HPV e câncer de cabeça e pescoço: desenvolvimento de um aplicativo para adolescentes

A First Look at Dataset Bias in License Plate Recognition

Do We Train on Test Data? The Impact of Near-Duplicates on License Plate Recognition

Probabilistic logic reasoning for subjective interestingness analysis

Contact Info

Product

Resources

About