Wanqing Cui scite author profile

Wanqing Cui

4Publications

59Citation Statements Received

83Citation Statements Given

How they've been cited

How they cite others

Affiliations

Publications

Order By: Most citations

WenLan: Bridging Vision and Language by Large-Scale Multi-Modal Pre-Training

Huo¹,

Zhang²,

Liu³

et al. 2021

Preprint

View full text Add to dashboard Cite

Multi-modal pre-training models have been intensively explored to bridge vision and language in recent years. However, most of them explicitly model the cross-modal interaction between image-text pairs, by assuming that there exists strong semantic correlation between the text and image modalities. Since this strong assumption is often invalid in real-world scenarios, we choose to implicitly model the cross-modal correlation for large-scale multi-modal pretraining, which is the focus of the Chinese project 'Wen-Lan' led by our team. Specifically, with the weak correlation assumption over image-text pairs, we propose a twotower pre-training model called BriVL within the crossmodal contrastive learning framework. Unlike OpenAI CLIP that adopts a simple contrastive learning method, we devise a more advanced algorithm by adapting the latest method MoCo into the cross-modal scenario. By building a large queue-based dictionary, our BriVL can incorporate more negative samples in limited GPU resources. We further construct a large Chinese multi-source imagetext dataset called RUC-CAS-WenLan for pre-training our BriVL model. Extensive experiments demonstrate that the pre-trained BriVL model outperforms both UNITER and OpenAI CLIP on various downstream tasks.

show abstract

Design and Implementation of Consecutive Interpreting System Based on Transformer NMT Model

Li¹,

Li²,

Cui³

et al. 2018

dtcse

View full text Add to dashboard Cite

The traditional machine translation system with push-to-talk mode is not suitable for the processing of long-time oral translation. This paper proposed a consecutive interpreting system, solving the problem of long-time continuous listening by using pipeline work mode. In this mode, audio sampling is always on during the whole speech. In usage scenarios, audiences of the speeches or lectures can see bilingual subtitles on the projection or on their own device, and this system will keep translating while listens to the speaker. The speech-to-text module is based on the speech recognition model of Baidu open platform, and the translation is based on the Transformer NMT model proposed by Google. The average translation delay time of our system is only about 0.8s in our delay test. This system can play the role of the interpreter in conferences or lectures where translation precision requirement is not high.

show abstract

Beyond Language: Learning Commonsense from Images for Reasoning

Cui¹,

Lan

Pang

et al. 2020

Preprint

View full text Add to dashboard Cite

This paper proposes a novel approach to learn commonsense from images, instead of limited raw texts or costly constructed knowledge bases, for the commonsense reasoning problem in NLP. Our motivation comes from the fact that an image is worth a thousand words, where richer scene information could be leveraged to help distill the commonsense knowledge, which is often hidden in languages. Our approach, namely Loire, consists of two stages. In the first stage, a bi-modal sequence-tosequence approach is utilized to conduct the scene layout generation task, based on a text representation model ViBERT. In this way, the required visual scene knowledge, such as spatial relations, will be encoded in ViBERT by the supervised learning process with some bimodal data like COCO. Then ViBERT is concatenated with a pre-trained language model to perform the downstream commonsense reasoning tasks. Experimental results on two commonsense reasoning problems, i.e. commonsense question answering and pronoun resolution, demonstrate that Loire outperforms traditional language-based methods. We also give some case studies to show what knowledge is learned from images and explain how the generated scene layout helps the commonsense reasoning process.

show abstract

Beyond Language: Learning Commonsense from Images for Reasoning

Cui¹,

Lan

Pang

et al. 2020

View full text Add to dashboard Cite

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Wanqing Cui

WenLan: Bridging Vision and Language by Large-Scale Multi-Modal Pre-Training

Design and Implementation of Consecutive Interpreting System Based on Transformer NMT Model

Beyond Language: Learning Commonsense from Images for Reasoning

Beyond Language: Learning Commonsense from Images for Reasoning

Contact Info

Product

Resources

About