Pre-Training With Whole Word Masking for Chinese BERT

Cui, Yiming; Che, Wanxiang; Liu, Ting; Qin, Bing; Yang, Ziqing

doi:10.1109/taslp.2021.3124365

Cited by 650 publications

(172 citation statements)

References 16 publications

Supporting

Mentioning

170

Contrasting

Unclassified

Order By: Relevance

“…For the textual feature extraction, the Chinese BERT with whole word masking [36,37] is used, and the max length of text is set to 160. For efficient training, the feature-based approach is adopted on the pretrained language model, which means that the parameters of the pretrained language model are fixed.…”

Section: Settingsmentioning

confidence: 99%

FMFN: Fine-Grained Multimodal Fusion Networks for Fake News Detection

Wang

Mao

2022

Applied Sciences

View full text Add to dashboard Cite

As one of the most popular social media platforms, microblogs are ideal places for news propagation. In microblogs, tweets with both text and images are more likely to attract attention than text-only tweets. This advantage is exploited by fake news producers to publish fake news, which has a devasting impact on individuals and society. Thus, multimodal fake news detection has attracted the attention of many researchers. For news with text and image, multimodal fake news detection utilizes both text and image information to determine the authenticity of news. Most of the existing methods for multimodal fake news detection obtain a joint representation by simply concatenating a vector representation of the text and a visual representation of the image, which ignores the dependencies between them. Although there are a small number of approaches that use the attention mechanism to fuse them, they are not fine-grained enough in feature fusion. The reason is that, for a given image, there are multiple visual features and certain correlations between these features. They do not use multiple feature vectors representing different visual features to fuse with textual features, and ignore the correlations, resulting in inadequate fusion of textual features and visual features. In this paper, we propose a novel fine-grained multimodal fusion network (FMFN) to fully fuse textual features and visual features for fake news detection. Scaled dot-product attention is utilized to fuse word embeddings of words in the text and multiple feature vectors representing different features of the image, which not only considers the correlations between different visual features but also better captures the dependencies between textual features and visual features. We conduct extensive experiments on a public Weibo dataset. Our approach achieves competitive results compared with other methods for fusing visual representation and text representation, which demonstrates that the joint representation learned by the FMFN (which fuses multiple visual features and multiple textual features) is better than the joint representation obtained by fusing a visual representation and a text representation in determining fake news.

show abstract

Section: Settingsmentioning

confidence: 99%

FMFN: Fine-Grained Multimodal Fusion Networks for Fake News Detection

Wang

Mao

2022

Applied Sciences

View full text Add to dashboard Cite

show abstract

“…Further, a great deal of work has been done exploring the language-agnostic properties of BERT including the ability to perform well on Arabic [14] and Chinese [15] texts. Other transformer models with similar attention-based architectures show further improvements.…”

Section: A Deep Learning In Nlpmentioning

confidence: 99%

“…To sufficiently compare last NLP Transformer models with other traditional NLP deep learning models we use Bidirectional Encoder Representations from Transformers(BERT) [8] as the representative due to its widespread popularity and bidirectional properties. Moreover, BERT has been shown to perform well at extracting information not only from English text but also languages with very different structure including Arabic [14] and Chinese [15]. These language and structure agnostic properties of BERT make it an attractive choice for applications outside of written language.…”

Section: Bertmentioning

confidence: 99%

Security Vulnerability Detection Using Deep Learning Natural Language Processing

Ziems

2021

IEEE INFOCOM 2021 - IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS)

View full text Add to dashboard Cite

Detecting security vulnerabilities in software before they are exploited has been a challenging problem for decades. Traditional code analysis methods have been proposed, but are often ineffective and inefficient. In this work, we model software vulnerability detection as a natural language processing (NLP) problem with source code treated as texts, and address the automated software venerability detection with recent advanced deep learning NLP models assisted by transfer learning on written English. For training and testing, we have preprocessed the NIST NVD/SARD databases and built a dataset of over 100,000 files in C programming language with 123 types of vulnerabilities. The extensive experiments generate the best performance of over 93% accuracy in detecting security vulnerabilities.

show abstract

“…Representative autoregressive language models are word2vec (Mikolov et al , 2013), Glove (Pennington et al , 2014), ELMO (Peters et al , 2018), GPT (Radford et al , 2018), GPT-2 (Radford et al , 2019) and XLNet (Yang et al , 2019), and they are more suitable for text generation task. Representative autoencoding language models are Bert (Devlin et al , 2018), Bert-wwm (Cui et al , 2019), RoBERTa (Liu et al , 2019), ALBERT (Lan et al , 2019), ERNIE (Sun et al , 2019a), ERNIE-2 (Sun et al , 2019b) and ELECTRA (Clark et al , 2020), and they are more suitable for entity and relation extraction.…”

Section: Related Workmentioning

confidence: 99%

Using pretraining and text mining methods to automatically extract the chemical scientific data

Pang¹,

Li²,

Lyu³

et al. 2021

DTA

View full text Add to dashboard Cite

Purpose In computational chemistry, the chemical bond energy (pKa) is essential, but most pKa-related data are submerged in scientific papers, with only a few data that have been extracted by domain experts manually. The loss of scientific data does not contribute to in-depth and innovative scientific data analysis. To address this problem, this study aims to utilize natural language processing methods to extract pKa-related scientific data in chemical papers. Design/methodology/approach Based on the previous Bert-CRF model combined with dictionaries and rules to resolve the problem of a large number of unknown words of professional vocabulary, in this paper, the authors proposed an end-to-end Bert-CRF model with inputting constructed domain wordpiece tokens using text mining methods. The authors use standard high-frequency string extraction techniques to construct domain wordpiece tokens for specific domains. And in the subsequent deep learning work, domain features are added to the input. Findings The experiments show that the end-to-end Bert-CRF model could have a relatively good result and can be easily transferred to other domains because it reduces the requirements for experts by using automatic high-frequency wordpiece tokens extraction techniques to construct the domain wordpiece tokenization rules and then input domain features to the Bert model. Originality/value By decomposing lots of unknown words with domain feature-based wordpiece tokens, the authors manage to resolve the problem of a large amount of professional vocabulary and achieve a relatively ideal extraction result compared to the baseline model. The end-to-end model explores low-cost migration for entity and relation extraction in professional fields, reducing the requirements for experts.

show abstract

Pre-Training With Whole Word Masking for Chinese BERT

Cited by 650 publications

References 16 publications

FMFN: Fine-Grained Multimodal Fusion Networks for Fake News Detection

FMFN: Fine-Grained Multimodal Fusion Networks for Fake News Detection

Security Vulnerability Detection Using Deep Learning Natural Language Processing

Using pretraining and text mining methods to automatically extract the chemical scientific data

Contact Info

Product

Resources

About