Haoyang Huang scite author profile

We present Unicoder, a universal language encoder that is insensitive to different languages. Given an arbitrary NLP task, a model can be trained with Unicoder using training data in one language and directly applied to inputs of the same task in other languages. Comparing to similar efforts such as Multilingual BERT (Devlin et al., 2018) and XLM (Lample and Conneau, 2019), three new crosslingual pre-training tasks are proposed, including cross-lingual word recovery, crosslingual paraphrase classification and crosslingual masked language model. These tasks help Unicoder learn the mappings among different languages from more perspectives. We also find that doing fine-tuning on multiple languages together can bring further improvement. Experiments are performed on two tasks: cross-lingual natural language inference (XNLI) and cross-lingual question answering (XQA), where XLM is our baseline. On XNLI, 1.8% averaged accuracy improvement (on 15 languages) is obtained. On XQA, which is a new cross-lingual dataset built by us, 5.5% averaged accuracy improvement (on French and German) is obtained.

show abstract

M³P: Learning Universal Representations via Multitask Multilingual Multimodal Pre-training

Huang

Lin

et al. 2021

View full text Add to dashboard Cite

Unicoder: A Universal Language Encoder by Pre-training with Multiple Cross-lingual Tasks

Huang

Liang

Duan

et al. 2019

Preprint

View full text Add to dashboard Cite

XGPT: Cross-modal Generative Pre-Training for Image Captioning

Qin

Huang

Duan

et al. 2021

View full text Add to dashboard Cite

Hierarchical Context-aware Network for Dense Video Event Captioning

Ji¹,

Guo²,

Huang³

et al. 2021

View full text Add to dashboard Cite

Dense video event captioning aims to generate a sequence of descriptive captions for each event in a long untrimmed video. Video-level context provides important information and facilities the model to generate consistent and less redundant captions between events. In this paper, we introduce a novel Hierarchical Context-aware Network for dense video event captioning (HCN) to capture context from various aspects. In detail, the model leverages local and global context with different mechanisms to jointly learn to generate coherent captions. The local context module performs full interaction between neighbor frames and the global context module selectively attends to previous or future events. According to our extensive experiment on both Youcook2 and Activitynet Captioning datasets, the videolevel HCN model outperforms the event-level context-agnostic model by a large margin. The code is available at https://github.com/ KirkGuo/HCN.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Haoyang Huang

Unicoder: A Universal Language Encoder by Pre-training with Multiple Cross-lingual Tasks

M³P: Learning Universal Representations via Multitask Multilingual Multimodal Pre-training

Unicoder: A Universal Language Encoder by Pre-training with Multiple Cross-lingual Tasks

XGPT: Cross-modal Generative Pre-Training for Image Captioning

Hierarchical Context-aware Network for Dense Video Event Captioning

Contact Info

Product

Resources

About

Haoyang Huang

Unicoder: A Universal Language Encoder by Pre-training with Multiple Cross-lingual Tasks

M3P: Learning Universal Representations via Multitask Multilingual Multimodal Pre-training

Unicoder: A Universal Language Encoder by Pre-training with Multiple Cross-lingual Tasks

XGPT: Cross-modal Generative Pre-Training for Image Captioning

Hierarchical Context-aware Network for Dense Video Event Captioning

Contact Info

Product

Resources

About

M³P: Learning Universal Representations via Multitask Multilingual Multimodal Pre-training