Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2016
DOI: 10.18653/v1/p16-1101
|View full text |Cite
|
Sign up to set email alerts
|

End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF

Abstract: State-of-the-art sequence labeling systems traditionally require large amounts of taskspecific knowledge in the form of handcrafted features and data pre-processing. In this paper, we introduce a novel neutral network architecture that benefits from both word-and character-level representations automatically, by using combination of bidirectional LSTM, CNN and CRF. Our system is truly end-to-end, requiring no feature engineering or data preprocessing, thus making it applicable to a wide range of sequence label… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

24
1,870
0
19

Year Published

2017
2017
2022
2022

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 2,056 publications
(1,935 citation statements)
references
References 40 publications
24
1,870
0
19
Order By: Relevance
“…More recent work on sequence labeling tasks relies instead on deep learning techniques such as convolutional or recurrent neural network models (CNNs LeCun et al, 1989 andRNNs Rumelhart, 1986, respectively), without the need for any hand-crafted features (Kim, 2014;Huang et al, 2015;Zhang et al, 2015;Chiu and Nichols, 2016;Lample et al, 2016;Ma and Hovy, 2016;Yang et al, 2016;Strubell et al, 2017). RNNs in particular, typically rely on a neural network architecture built using one or more Bidirectional LongShort Term Memory (BiLSTM) layers, as this type of neural cell provides for variable-length memory allowing the model to capture relationships within sequences of proximal words.…”
Section: Related Workmentioning
confidence: 99%
See 3 more Smart Citations
“…More recent work on sequence labeling tasks relies instead on deep learning techniques such as convolutional or recurrent neural network models (CNNs LeCun et al, 1989 andRNNs Rumelhart, 1986, respectively), without the need for any hand-crafted features (Kim, 2014;Huang et al, 2015;Zhang et al, 2015;Chiu and Nichols, 2016;Lample et al, 2016;Ma and Hovy, 2016;Yang et al, 2016;Strubell et al, 2017). RNNs in particular, typically rely on a neural network architecture built using one or more Bidirectional LongShort Term Memory (BiLSTM) layers, as this type of neural cell provides for variable-length memory allowing the model to capture relationships within sequences of proximal words.…”
Section: Related Workmentioning
confidence: 99%
“…Such architectures have achieved state-of-the-art performance for both POS and NER tasks on popular datasets (Reimers and Gurevych, 2017b). Current state-of-the-art architectures for sequence labeling include the use of a CRF prediction layer (Huang et al, 2015) and the use of character-level word embeddings to complement word embeddings, trained either with CNNs (Ma and Hovy, 2016) or BiLSTM RNNs (Lample et al, 2016). Character-level word embeddings have indeed been shown to perform well on a variety of NLP tasks (Dos Santos and Gatti de Bayser, 2014;Kim et al, 2015;Zhang et al, 2015).…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…Lample et al [8] gives a more specific implementation of neural network model for NER tasks, their model introduced pretrained word embeddings from large corpus, and gives F1 scores of 90.97% on CoNLL 2003 dataset using no context features or spelling features. Ma et al [9] combines Chiu's CNN model [10] and Lample's bidirectional LSTM-CRF model [8] for end-to-end SLPs. These models use neural networks strategies and/or pretrained word embeddings to give better generalization performance.…”
Section: Introductionmentioning
confidence: 99%