Yu Zhou scite author profile

Cross-lingual summarization (CLS) is the task to produce a summary in one particular language for a source document in a different language. Existing methods simply divide this task into two steps: summarization and translation, leading to the problem of error propagation. To handle that, we present an end-to-end CLS framework, which we refer to as Neural Cross-Lingual Summarization (NCLS), for the first time. Moreover, we propose to further improve NCLS by incorporating two related tasks, monolingual summarization and machine translation, into the training process of CLS under multi-task learning. Due to the lack of supervised CLS data, we propose a round-trip translation strategy to acquire two high-quality large-scale CLS datasets based on existing monolingual summarization datasets. Experimental results have shown that our NCLS achieves remarkable improvement over traditional pipeline methods on both English-to-Chinese and Chinese-to-English CLS human-corrected test sets. In addition, NCLS with multi-task learning can further significantly improve the quality of generated summaries. We make our dataset and code publicly available here:Rod gray , 94 , had been taken to hospital by ambulance after he cut his head in a fall at his home … Rod gray was taken to ipswich hospital after falling over at home .Rod gray was taken to Ipswich Hospital after falling down at home. MS RTT English Article (Input for MS or CLS)English Reference (Output for MS) Chinese ReferenceRod gray was taken to ipswich hospital after falling over at home .Rod-Grau wurde nach dem Sturz zu Hause ins ipswich-Krankenhaus gebracht.

show abstract

Video Playback Rate Perception for Self-Supervised Spatio-Temporal Representation Learning

Yao

et al. 2020

View full text Add to dashboard Cite

Video Cloze Procedure for Self-Supervised Spatio-Temporal Learning

Luo

Liu

Zhou

et al. 2020

AAAI

142

View full text Add to dashboard Cite

We propose a novel self-supervised method, referred to as Video Cloze Procedure (VCP), to learn rich spatial-temporal representations. VCP first generates “blanks” by withholding video clips and then creates “options” by applying spatio-temporal operations on the withheld clips. Finally, it fills the blanks with “options” and learns representations by predicting the categories of operations applied on the clips. VCP can act as either a proxy task or a target task in self-supervised learning. As a proxy task, it converts rich self-supervised representations into video clip operations (options), which enhances the flexibility and reduces the complexity of representation learning. As a target task, it can assess learned representation models in a uniform and interpretable manner. With VCP, we train spatial-temporal representation models (3D-CNNs) and apply such models on action recognition and video retrieval tasks. Experiments on commonly used benchmarks show that the trained models outperform the state-of-the-art self-supervised models with significant margins.

show abstract

Multimodal Summarization with Guidance of Multimodal Reference

Zhu

Zhou

Zhang

et al. 2020

AAAI

View full text Add to dashboard Cite

Multimodal summarization with multimodal output (MSMO) is to generate a multimodal summary for a multimodal news report, which has been proven to effectively improve users' satisfaction. The existing MSMO methods are trained by the target of text modality, leading to the modality-bias problem that ignores the quality of model-selected image during training. To alleviate this problem, we propose a multimodal objective function with the guidance of multimodal reference to use the loss from the summary generation and the image selection. Due to the lack of multimodal reference data, we present two strategies, i.e., ROUGE-ranking and Order-ranking, to construct the multimodal reference by extending the text reference. Meanwhile, to better evaluate multimodal outputs, we propose a novel evaluation metric based on joint multimodal representation, projecting the model output and multimodal reference into a joint semantic space during evaluation. Experimental results have shown that our proposed model achieves the new state-of-the-art on both automatic and manual evaluation metrics. Besides, our proposed evaluation method can effectively improve the correlation with human judgments.

show abstract

Abstractive Cross-Language Summarization via Translation Model Enhanced Predicate Argument Structure Fusing

Zhang

Zhou

Zong

2016

IEEE/ACM Trans. Audio Speech Lang. Process.

View full text Add to dashboard Cite

Matching User Photos to Online Products with Robust Deep Features

Wang

Sun

Zhang

et al. 2016

View full text Add to dashboard Cite

Attend, Translate and Summarize: An Efficient Method for Neural Cross-Lingual Summarization

Zhu¹,

Zhou²,

Zhang³

2020

View full text Add to dashboard Cite

Cross-lingual summarization aims at summarizing a document in one language (e.g., Chinese) into another language (e.g., English). In this paper, we propose a novel method inspired by the translation pattern in the process of obtaining a cross-lingual summary. We first attend to some words in the source text, then translate them into the target language, and summarize to get the final summary. Specifically, we first employ the encoder-decoder attention distribution to attend to the source words. Second, we present three strategies to acquire the translation probability, which helps obtain the translation candidates for each source word. Finally, each summary word is generated either from the neural distribution or from the translation candidates of source words. Experimental results on Chinese-to-English and English-to-Chinese summarization tasks have shown that our proposed method can significantly outperform the baselines, achieving comparable performance with the state-of-the-art.

show abstract

Video Cloze Procedure for Self-Supervised Spatio-Temporal Learning

Luo

Liu

Zhou

et al. 2020

Preprint

View full text Add to dashboard Cite

We propose a novel self-supervised method, referred to as Video Cloze Procedure (VCP), to learn rich spatial-temporal representations. VCP first generates "blanks" by withholding video clips and then creates "options" by applying spatiotemporal operations on the withheld clips. Finally, it fills the blanks with "options" and learns representations by predicting the categories of operations applied on the clips. VCP can act as either a proxy task or a target task in self-supervised learning. As a proxy task, it converts rich self-supervised representations into video clip operations (options), which enhances the flexibility and reduces the complexity of representation learning. As a target task, it can assess learned representation models in a uniform and interpretable manner. With VCP, we train spatial-temporal representation models (3D-CNNs) and apply such models on action recognition and video retrieval tasks. Experiments on commonly used benchmarks show that the trained models outperform the state-ofthe-art self-supervised models with significant margins.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Yu Zhou

NCLS: Neural Cross-Lingual Summarization

Video Playback Rate Perception for Self-Supervised Spatio-Temporal Representation Learning

Video Cloze Procedure for Self-Supervised Spatio-Temporal Learning

Multimodal Summarization with Guidance of Multimodal Reference

Abstractive Cross-Language Summarization via Translation Model Enhanced Predicate Argument Structure Fusing

Matching User Photos to Online Products with Robust Deep Features

Attend, Translate and Summarize: An Efficient Method for Neural Cross-Lingual Summarization

Video Cloze Procedure for Self-Supervised Spatio-Temporal Learning

Contact Info

Product

Resources

About