Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Confere 2015
DOI: 10.3115/v1/p15-1033
|View full text |Cite
|
Sign up to set email alerts
|

Transition-Based Dependency Parsing with Stack Long Short-Term Memory

Abstract: We propose a technique for learning representations of parser states in transitionbased dependency parsers. Our primary innovation is a new control structure for sequence-to-sequence neural networksthe stack LSTM. Like the conventional stack data structures used in transitionbased parsing, elements can be pushed to or popped from the top of the stack in constant time, but, in addition, an LSTM maintains a continuous space embedding of the stack contents. This lets us formulate an efficient parsing model that c… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

5
556
0
3

Year Published

2015
2015
2022
2022

Publication Types

Select...
3
3
3

Relationship

0
9

Authors

Journals

citations
Cited by 511 publications
(564 citation statements)
references
References 20 publications
5
556
0
3
Order By: Relevance
“…Representing words or relations with continuous vectors (Mikolov et al, 2013;Ji and Eisenstein, 2014) embeds semantics in the same space, which benefits alleviating the data sparseness problem and enables end-to-end and multi-task learning. Recurrent neural networks (RNNs) (Graves, 2012) and the variants like Long Short-Term Memory (LSTM) (Hochreiter and Schmidhuber, 1997) and Gated Recurrent (GRU) (Cho et al, 2014) neural networks show good performance for capturing long distance dependencies on tasks like Named Entity Recognition (NER) (Chiu and Nichols, 2016;Ma and Hovy, 2016), dependency parsing (Dyer et al, 2015) and semantic composition of documents (Tang et al, 2015). This work describes a hierarchical neural architecture with multiple label outputs for modeling the discourse mode sequence of sentences.…”
Section: Neural Sequence Modelingmentioning
confidence: 99%
“…Representing words or relations with continuous vectors (Mikolov et al, 2013;Ji and Eisenstein, 2014) embeds semantics in the same space, which benefits alleviating the data sparseness problem and enables end-to-end and multi-task learning. Recurrent neural networks (RNNs) (Graves, 2012) and the variants like Long Short-Term Memory (LSTM) (Hochreiter and Schmidhuber, 1997) and Gated Recurrent (GRU) (Cho et al, 2014) neural networks show good performance for capturing long distance dependencies on tasks like Named Entity Recognition (NER) (Chiu and Nichols, 2016;Ma and Hovy, 2016), dependency parsing (Dyer et al, 2015) and semantic composition of documents (Tang et al, 2015). This work describes a hierarchical neural architecture with multiple label outputs for modeling the discourse mode sequence of sentences.…”
Section: Neural Sequence Modelingmentioning
confidence: 99%
“…Then DAG-GRNN automatically learns the complicated combination of all the features, while the traditional discrete feature based methods need manually design them. Dyer et al (2015) improved the transition-based dependency parsing using stack long short term memory neural network and received significant improvement on performance. They focused on exploiting the long distance dependencies and information, while we aims to automatically model the complicated feature combination.…”
Section: Related Workmentioning
confidence: 99%
“…In the latter, the task is based on dynamic conditional random fields and applied to a conversational speech domain. A more recent work [2] introduces a language-independent model with a transition-based algorithm using LSTMs [11], without any additional syntactic features.…”
Section: Related Workmentioning
confidence: 99%
“…Introduced as a simpler variate of long short-term memory (LSTM) units [11], GRUs make computation simpler by having fewer parameters. Number of gates in hidden units are reduced to two: (a) the reset gate determines whether the previous memory will be ignored, and (b) the update gate determines how much of the previous memory will be carried on.…”
Section: Our Modelmentioning
confidence: 99%