Zhifan Feng scite author profile

Zhifan Feng

4Publications

13Citation Statements Received

144Citation Statements Given

How they've been cited

How they cite others

112

143

Affiliations

Baidu (China), Ningbo Institute of Industrial Technology, Chinese Academy of Sciences

Publications

Order By: Most citations

Improving Video Retrieval by Adaptive Margin

Wang

Feng

et al. 2021

View full text Add to dashboard Cite

Video retrieval is becoming increasingly important owing to the rapid emergence of videos on the Internet. The dominant paradigm for video retrieval learns video-text representations by pushing the distance between the similarity of positive pairs and that of negative pairs apart from a fixed margin. However, negative pairs used for training are sampled randomly, which indicates that the semantics between negative pairs may be related or even equivalent, while most methods still enforce dissimilar representations to decrease their similarity. This phenomenon leads to inaccurate supervision and poor performance in learning video-text representations.While most video retrieval methods overlook that phenomenon, we propose an adaptive margin changed with the distance between positive and negative pairs to solve the aforementioned issue. First, we design the calculation framework of the adaptive margin, including the method of distance measurement and the function between the distance and the margin. Then, we explore a novel implementation called "Cross-Modal Generalized Self-Distillation" (CMGSD), which can be built on the top of most video retrieval models with few modifications. Notably, CMGSD adds few computational overheads at train time and adds no computational overhead at test time. Experimental results on three widely used datasets demonstrate that the proposed method can yield significantly better performance than the corresponding backbone model, and it outperforms state-of-the-art methods by a large margin. CCS CONCEPTS• Information systems → Video search.

show abstract

A CLIP-Enhanced Method for Video-Language Understanding

Li¹,

He²,

Feng³

2021

Preprint

View full text Add to dashboard Cite

CLOP: Video-and-Language Pre-Training with Knowledge Regularizations

Yang

et al. 2022

View full text Add to dashboard Cite

Video-and-language pre-training has shown promising results for learning generalizable representations. Most existing approaches usually model video and text in an implicit manner, without considering explicit structural representations of the multi-modal content. We denote such form of representations as "structural knowledge", which express rich semantics of multiple granularities. There are related works that propose object-aware approaches to inject similar knowledge as inputs. However, the existing methods usually fail to effectively utilize such knowledge as "regularizations" to shape a superior cross-modal representation space. To this end, we propose a Cross-modaL knOwledge-enhanced Pre-training (CLOP) method with Knowledge Regularizations. There are two key designs of ours: 1) a simple yet effective Structural Knowledge Prediction (SKP) task to pull together the latent representations of similar videos; and 2) a novel Knowledge-guided sampling approach for Contrastive Learning (KCL) to push apart cross-modal hard negative samples. We evaluate our method on four text-video retrieval tasks and one multi-choice QA task. The experiments show clear improvements, outperforming prior works by a substantial margin. Besides, we provide ablations and insights of how our methods affect the latent representation space, demonstrating the value of incorporating knowledge regularizations into video-and-language pre-training.

show abstract

Extrinsic Calibration between Camera and LiDAR Sensors by Virtual Planar Junctions Matching

Liu

Xiao

et al. 2020

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Zhifan Feng

Improving Video Retrieval by Adaptive Margin

A CLIP-Enhanced Method for Video-Language Understanding

CLOP: Video-and-Language Pre-Training with Knowledge Regularizations

Extrinsic Calibration between Camera and LiDAR Sensors by Virtual Planar Junctions Matching

Contact Info

Product

Resources

About