ZEN: Pre-training Chinese Text Encoder Enhanced by N-gram Representations

Diao, Shizhe; Bai, Jiaxin; Yan, Shuicheng; Zhang, Tong; Wang, Yonggang

doi:10.18653/v1/2020.findings-emnlp.425

Cited by 80 publications

(53 citation statements)

References 35 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Extra knowledge (e.g., pre-trained embeddings (Song et al, 2017;Song and Shi, 2018; and pretrained models (Devlin et al, 2019;Diao et al, 2019)) can provide useful information and thus enhance model performance for many NLP tasks (Tian et al, 2020a,b,c). Specifically, memory and memory-augmented neural networks (Zeng et al, 2018;Santoro et al, 2018;Diao et al, 2020;Tian et al, 2020d) are another line of related research, which can be traced back to , which proposed memory networks to leverage extra information for question answering; then Sukhbaatar et al (2015) improved it with an end-to-end design to ensure the model being trained with less supervision.…”

Section: Base+rm+mclnmentioning

confidence: 99%

Generating Radiology Reports via Memory-driven Transformer

Chen¹,

Yan²,

Chang³

et al. 2020

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Self Cite

213

192

View full text Add to dashboard Cite

Medical imaging is frequently used in clinical practice and trials for diagnosis and treatment. Writing imaging reports is time-consuming and can be error-prone for inexperienced radiologists. Therefore, automatically generating radiology reports is highly desired to lighten the workload of radiologists and accordingly promote clinical automation, which is an essential task to apply artificial intelligence to the medical domain. In this paper, we propose to generate radiology reports with memorydriven Transformer, where a relational memory is designed to record key information of the generation process and a memory-driven conditional layer normalization is applied to incorporating the memory into the decoder of Transformer. Experimental results on two prevailing radiology report datasets, IU X-Ray and MIMIC-CXR, show that our proposed approach outperforms previous models with respect to both language generation metrics and clinical evaluations. Particularly, this is the first work reporting the generation results on MIMIC-CXR to the best of our knowledge. Further analyses also demonstrate that our approach is able to generate long reports with necessary medical terms as well as meaningful image-text attention mappings. 1

show abstract

Section: Base+rm+mclnmentioning

confidence: 99%

Generating Radiology Reports via Memory-driven Transformer

Chen¹,

Yan²,

Chang³

et al. 2020

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Self Cite

213

192

View full text Add to dashboard Cite

show abstract

“…In our main experiments, we use two types of embeddings for each language: ELMo (Peters et al, 2018) and BERT-cased large (Devlin et al, 2019) for English, and Tencent Embedding (Song et al, 2018b) and ZEN (Diao et al, 2019) for Chinese. In Table 5, we report the results (F 1 scores) of our model with the best setting (i.e.…”

Section: Discussionmentioning

confidence: 99%

“…From the results, it is found that our model with AU and GA can consistently outperforms the baseline models with different settings of embeddings. In our main experiments, we use ZEN (Diao et al, 2019) instead of BERT (Devlin et al, 2019) as the embedding to represent the input for Chinese. The reason is that ZEN achieves better performance compared with BERT, which is confirmed by Table 6 with its results (F 1 scores) showing the performance of our approach with the best settings (i.e.…”

Section: Discussionmentioning

confidence: 99%

“…For all three datasets, we use their original splits and report the statistics of them in for each language. 2 Specifically, for English, we use ELMo (Peters et al, 2018) and BERT-cased large (Devlin et al, 2019); for Chinese, we use Tencent Embedding (Song et al, 2018b), and ZEN (Diao et al, 2019). 3 In the context encoding module, we use a two-layer transformer-based encoder proposed by with 128 hidden units and 12 heads.…”

Section: Settingsmentioning

confidence: 99%

See 1 more Smart Citation

Named Entity Recognition for Social Media Texts with Semantic Augmentation

Nie¹,

Tian

Wan

et al. 2020

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Self Cite

View full text Add to dashboard Cite

Existing approaches for named entity recognition suffer from data sparsity problems when conducted on short and informal texts, especially user-generated social media content. Semantic augmentation is a potential way to alleviate this problem. Given that rich semantic information is implicitly preserved in pre-trained word embeddings, they are potential ideal resources for semantic augmentation. In this paper, we propose a neural-based approach to NER for social media texts where both local (from running text) and augmented semantics are taken into account. In particular, we obtain the augmented semantic information from a large-scale corpus, and propose an attentive semantic augmentation module and a gate module to encode and aggregate such information, respectively. Extensive experiments are performed on three benchmark datasets collected from English and Chinese social media platforms, where the results demonstrate the superiority of our approach to previous studies across all three datasets. 1 * Equal contribution.

show abstract

“…In our experiments, we use BERT (Devlin et al, 2019) as the basic encoder for all three languages and use ZEN (Diao et al, 2019) and XLNet-large (Yang et al, 2019) for Chinese and English, respectively. 11 For BERT, ZEN, and XLNet, we use the default hyper-parameter settings.…”

Section: Model Implementationmentioning

confidence: 99%

Improving Constituency Parsing with Span Attention

Tian¹,

Yan²,

Xia³

et al. 2020

Findings of the Association for Computational Linguistics: EMNLP 2020

Self Cite

View full text Add to dashboard Cite

Constituency parsing is a fundamental and important task for natural language understanding, where a good representation of contextual information can help this task. N-grams, which is a conventional type of feature for contextual information, have been demonstrated to be useful in many tasks, and thus could also be beneficial for constituency parsing if they are appropriately modeled. In this paper, we propose span attention for neural chartbased constituency parsing to leverage n-gram information. Considering that current chartbased parsers with Transformer-based encoder represent spans by subtraction of the hidden states at the span boundaries, which may cause information loss especially for long spans, we incorporate n-grams into span representations by weighting them according to their contributions to the parsing process. Moreover, we propose categorical span attention to further enhance the model by weighting ngrams within different length categories, and thus benefit long-sentence parsing. Experimental results on three widely used benchmark datasets demonstrate the effectiveness of our approach in parsing Arabic, Chinese, and English, where state-of-the-art performance is obtained by our approach on all of them. 1

show abstract

ZEN: Pre-training Chinese Text Encoder Enhanced by N-gram Representations

Cited by 80 publications

References 35 publications

Generating Radiology Reports via Memory-driven Transformer

Generating Radiology Reports via Memory-driven Transformer

Named Entity Recognition for Social Media Texts with Semantic Augmentation

Improving Constituency Parsing with Span Attention

Contact Info

Product

Resources

About