Lei Ke scite author profile

Typical techniques for video captioning follow the encoder-decoder framework, which can only focus on one source video being processed. A potential disadvantage of such design is that it cannot capture the multiple visual context information of a word appearing in more than one relevant videos in training data. To tackle this limitation, we propose the Memory-Attended Recurrent Network (MARN) for video captioning, in which a memory structure is designed to explore the full-spectrum correspondence between a word and its various similar visual contexts across videos in training data. Thus, our model is able to achieve a more comprehensive understanding for each word and yield higher captioning quality. Furthermore, the built memory structure enables our method to model the compatibility between adjacent words explicitly instead of asking the model to learn implicitly, as most existing models do. Extensive validation on two real-word datasets demonstrates that our MARN consistently outperforms state-of-the-art methods.

show abstract

Multilayered silicon embedded porous carbon/graphene hybrid film as a high performance anode

Qin

Zhang

et al. 2015

Carbon

146

View full text Add to dashboard Cite

How a very trace amount of graphene additive works for constructing an efficient conductive network in LiCoO2-based lithium-ion batteries

Tang

Yun

et al. 2016

Carbon

View full text Add to dashboard Cite

GSNet: Joint Vehicle Pose and Shape Reconstruction with Geometrical and Scene-Aware Supervision

Sun

et al. 2020

View full text Add to dashboard Cite

Deep Occlusion-Aware Instance Segmentation with Overlapping BiLayers

Tai²,

Tang

2021

115

View full text Add to dashboard Cite

Segmenting highly-overlapping image objects is challenging, because there is typically no distinction between real object contours and occlusion boundaries on images. Unlike previous instance segmentation methods, we model image formation as a composition of two overlapping layers, and propose Bilayer Convolutional Network (BCNet), where the top layer detects occluding objects (occluders) and the bottom layer infers partially occluded instances (occludees). The explicit modeling of occlusion relationship with bilayer structure naturally decouples the boundaries of both the occluding and occluded instances, and considers the interaction between them during mask regression. We investigate the efficacy of bilayer structure using two popular convolutional network designs, namely, Fully Convolutional Network (FCN) and Graph Convolutional Network (GCN). Further, we formulate bilayer decoupling using the vision transformer (ViT), by representing instances in the image as separate learnable occluder and occludee queries. Large and consistent improvements using one/two-stage and query-based object detectors with various backbones and network layer choices validate the generalization ability of bilayer decoupling, as shown by extensive experiments on image instance segmentation benchmarks (COCO, KINS, COCOA) and video instance segmentation benchmarks (YTVIS, OVIS, BDD100K MOTS), especially for heavy occlusion cases. Code and data are available at https://github.com/lkeab/BCNet.

show abstract

Vision-Based Framework for Automatic Progress Monitoring of Precast Walls by Using Surveillance Videos during the Construction Phase

Wang

Zhang

Yang

et al. 2021

J. Comput. Civ. Eng.

View full text Add to dashboard Cite

Commonality-Parsing Network Across Shape and Appearance for Partially Supervised Instance Segmentation

Fan

Pei

et al. 2020

View full text Add to dashboard Cite

12 3 4 5

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Lei Ke

Cascaded Deep Monocular 3D Human Pose Estimation With Evolutionary Training Data

Memory-Attended Recurrent Network for Video Captioning

Multilayered silicon embedded porous carbon/graphene hybrid film as a high performance anode

How a very trace amount of graphene additive works for constructing an efficient conductive network in LiCoO2-based lithium-ion batteries

GSNet: Joint Vehicle Pose and Shape Reconstruction with Geometrical and Scene-Aware Supervision

Deep Occlusion-Aware Instance Segmentation with Overlapping BiLayers

Vision-Based Framework for Automatic Progress Monitoring of Precast Walls by Using Surveillance Videos during the Construction Phase

Commonality-Parsing Network Across Shape and Appearance for Partially Supervised Instance Segmentation

Contact Info

Product

Resources

About