Wenjie Pei scite author profile

Typical techniques for video captioning follow the encoder-decoder framework, which can only focus on one source video being processed. A potential disadvantage of such design is that it cannot capture the multiple visual context information of a word appearing in more than one relevant videos in training data. To tackle this limitation, we propose the Memory-Attended Recurrent Network (MARN) for video captioning, in which a memory structure is designed to explore the full-spectrum correspondence between a word and its various similar visual contexts across videos in training data. Thus, our model is able to achieve a more comprehensive understanding for each word and yield higher captioning quality. Furthermore, the built memory structure enables our method to model the compatibility between adjacent words explicitly instead of asking the model to learn implicitly, as most existing models do. Extensive validation on two real-word datasets demonstrates that our MARN consistently outperforms state-of-the-art methods.

show abstract

Temporal Attention-Gated Model for Robust Sequence Classification

Pei

Baltrušaitis

Tax

et al. 2017

View full text Add to dashboard Cite

Typical techniques for sequence classification are designed for well-segmented sequences which have been edited to remove noisy or irrelevant parts. Therefore, such methods cannot be easily applied on noisy sequences expected in real-world applications. In this paper, we present the Temporal Attention-Gated Model (TAGM) which integrates ideas from attention models and gated recurrent networks to better deal with noisy or unsegmented sequences. Specifically, we extend the concept of attention model to measure the relevance of each observation (time step) of a sequence. We then use a novel gated recurrent network to learn the hidden representation for the final prediction. An important advantage of our approach is interpretability since the temporal attention weights provide a meaningful value for the salience of each time step in the sequence. We demonstrate the merits of our TAGM approach, both for prediction accuracy and interpretability, on three different tasks: spoken digit recognition, text-based sentiment analysis and visual event recognition.

show abstract

Multivariate Time-Series Classification Using the Hidden-Unit Logistic Model

Pei

Dibeklioğlu

Tax

et al. 2018

IEEE Trans. Neural Netw. Learning Syst.

View full text Add to dashboard Cite

Abstract-We present a new model for multivariate time series classification, called the hidden-unit logistic model, that uses binary stochastic hidden units to model latent structure in the data. The hidden units are connected in a chain structure that models temporal dependencies in the data. Compared to the prior models for time series classification such as the hidden conditional random field, our model can model very complex decision boundaries because the number of latent states grows exponentially with the number of hidden units. We demonstrate the strong performance of our model in experiments on a variety of (computer vision) tasks, including handwritten character recognition, speech recognition, facial expression, and action recognition. We also present a state-of-the-art system for facial action unit detection based on the hidden-unit logistic model.

show abstract

CPGAN: Content-Parsing Generative Adversarial Networks for Text-to-Image Synthesis

Liang

Pei

2020

View full text Add to dashboard Cite

Commonality-Parsing Network Across Shape and Appearance for Partially Supervised Instance Segmentation

Fan

Pei

et al. 2020

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Wenjie Pei

Memory-Attended Recurrent Network for Video Captioning

Temporal Attention-Gated Model for Robust Sequence Classification

Multivariate Time-Series Classification Using the Hidden-Unit Logistic Model

CPGAN: Content-Parsing Generative Adversarial Networks for Text-to-Image Synthesis

Commonality-Parsing Network Across Shape and Appearance for Partially Supervised Instance Segmentation

Contact Info

Product

Resources

About