End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF

Ma, Xuezhe; Hovy, Eduard

doi:10.18653/v1/p16-1101

Cited by 2,056 publications

(1,935 citation statements)

References 40 publications

Supporting

Mentioning

1,870

Contrasting

Unclassified

Order By: Relevance

“…More recent work on sequence labeling tasks relies instead on deep learning techniques such as convolutional or recurrent neural network models (CNNs LeCun et al, 1989 andRNNs Rumelhart, 1986, respectively), without the need for any hand-crafted features (Kim, 2014;Huang et al, 2015;Zhang et al, 2015;Chiu and Nichols, 2016;Lample et al, 2016;Ma and Hovy, 2016;Yang et al, 2016;Strubell et al, 2017). RNNs in particular, typically rely on a neural network architecture built using one or more Bidirectional LongShort Term Memory (BiLSTM) layers, as this type of neural cell provides for variable-length memory allowing the model to capture relationships within sequences of proximal words.…”

Section: Related Workmentioning

confidence: 99%

“…Such architectures have achieved state-of-the-art performance for both POS and NER tasks on popular datasets (Reimers and Gurevych, 2017b). Current state-of-the-art architectures for sequence labeling include the use of a CRF prediction layer (Huang et al, 2015) and the use of character-level word embeddings to complement word embeddings, trained either with CNNs (Ma and Hovy, 2016) or BiLSTM RNNs (Lample et al, 2016). Character-level word embeddings have indeed been shown to perform well on a variety of NLP tasks (Dos Santos and Gatti de Bayser, 2014;Kim et al, 2015;Zhang et al, 2015).…”

Section: Related Workmentioning

confidence: 99%

“…Attention mechanisms have also been proposed for the same tasks (Rei et al, 2016;Shen and Lee, 2016). In this paper we will apply, tune and compare two architectures (Lample et al, 2016;Ma and Hovy, 2016) to the specific task of reference mining.…”

Section: Related Workmentioning

confidence: 99%

“…We consider a recurrent architecture organized into three layers: input (word representations), inner and prediction, following the best performing models for sequence labeling tasks (Lample et al, 2016;Ma and Hovy, 2016). The network firstly receives a sequence of (one-hot encoded) words w (1) , w (2) , ..., w (n) as input and transforms it into a sequence of dense vectors x (1) , x (2) , ..., x (n) , using a combination of word and character-level word embeddings.…”

Section: Modelmentioning

confidence: 99%

See 3 more Smart Citations

Deep Reference Mining From Scholarly Literature in the Arts and Humanities

Alves

Colavizza

Kaplan

2018

Front. Res. Metr. Anal.

View full text Add to dashboard Cite

We consider the task of reference mining: the detection, extraction and classification of references within the full text of scholarly publications. Reference mining brings forward specific challenges, such as the need to capture the morphology of highly abbreviated words and the dependence among the elements of a reference, both following codified reference styles. This task is particularly difficult, and little explored, with respect to the literature in the arts and humanities, where references are mostly given in footnotes. We apply a deep learning architecture for reference mining from the full text of scholarly publications. We explore and discuss three architectural components: word and character-level word embeddings, different prediction layers (Softmax and Conditional Random Fields) and multi-task over single-task learning. Our best model uses both pre-trained word embeddings and characters embeddings, and a BiLSTM-CRF architecture. We test our solution on a dataset of annotated references from the historiography on Venice and, using a linear-chain CRF classifier as a baseline, we show that this deep learning architecture improves by a considerable margin. Furthermore, multi-task learning performs almost on par with a single-task approach. We thus confirm that there are important gains to be had by adopting deep learning for the task of reference mining.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Modelmentioning

confidence: 99%

See 2 more Smart Citations

Deep Reference Mining From Scholarly Literature in the Arts and Humanities

Alves

Colavizza

Kaplan

2018

Front. Res. Metr. Anal.

View full text Add to dashboard Cite

show abstract

“…Lample et al [8] gives a more specific implementation of neural network model for NER tasks, their model introduced pretrained word embeddings from large corpus, and gives F1 scores of 90.97% on CoNLL 2003 dataset using no context features or spelling features. Ma et al [9] combines Chiu's CNN model [10] and Lample's bidirectional LSTM-CRF model [8] for end-to-end SLPs. These models use neural networks strategies and/or pretrained word embeddings to give better generalization performance.…”

Section: Introductionmentioning

confidence: 99%

A Chinese Named Entity Recognition System with Neural Networks

Huang

Yang

2017

ITM Web Conf.

View full text Add to dashboard Cite

Named entity recognition (NER) is a typical sequential labeling problem that plays an important role in natural language processing (NLP) systems. In this paper, we discussed the details of applying a comprehensive model aggregating neural networks and conditional random field (CRF) on Chinese NER tasks, and how to discovery character level features when implement a NER system in word level. We compared the difference between Chinese and English when modeling the character embeddings. We developed a NER system based on our analysis, it works well on the ACE 2004 and SIGHAN bakeoff 2006 MSRA dataset, and doesn't rely on any gazetteers or handcraft features. We obtained F1 score of 82.3% on MSRA 2006.

show abstract

Chinese causal event extraction using causality‐associated graph neural network

Gao

Luo

Wang

2021

Concurrency and Computation

View full text Add to dashboard Cite

Causal event extraction (CEE) aims to identify and extract cause-effect event pairs from texts, which is a fundamental task in natural language processing. Recent research treat CEE as a sequence labeling problem. However, the linguistic complexity and ambiguity of textual description results in the low accuracy of extractors. To address the above issues, considering the prior knowledge like the causal network constructed based on the causal indicators, which can represent information transition between cause and effect, may helpful for CEE. In this article, we propose causality-associated graph neural network to incorporate in-domain knowledge by taking important causal words into account. External causal knowledge is modeled as causal associated graph (CAG). Then we use graph neural networks (GNN) to capture the complex relationship of intraevent mentions and interevent causality in a sentence based on the relationship obtained from CAG. Finally, sentence sequence and prior causal knowledge of GNN embedding are fed into multiscaled convolution and bidirectional long short-term memory networks. Experimental results on two datasets show that our method outperforms the state-of-the-art baseline.

show abstract

End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF

Cited by 2,056 publications

References 40 publications

Deep Reference Mining From Scholarly Literature in the Arts and Humanities

Deep Reference Mining From Scholarly Literature in the Arts and Humanities

A Chinese Named Entity Recognition System with Neural Networks

Chinese causal event extraction using causality‐associated graph neural network

Contact Info

Product

Resources

About