Improved Transition-based Parsing by Modeling Characters instead of Words with LSTMs

Ballesteros, Miguel; Dyer, Chris; Smith, Noah A.

doi:10.18653/v1/d15-1041

Cited by 224 publications

(186 citation statements)

References 22 publications

Supporting

Mentioning

181

Contrasting

Order By: Relevance

“…Ling et al (2015a) used a bidirectional long shortterm memory (LSTM) RNN on characters to embed arbitrary word types, showing strong performance for language modeling and POS tagging. Ballesteros et al (2015) used this model to represent words for dependency parsing. Several have used character-level RNN architectures for machine translation, whether for representing source or target words (Ling et al, 2015b;Luong and Manning, 2016), or for generating entire translations character-by-character (Chung et al, 2016).…”

Section: Related Workmentioning

confidence: 99%

Charagram: Embedding Words and Sentences via Character n-grams

Wieting

Bansal

Gimpel

et al. 2016

Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

139

118

View full text Add to dashboard Cite

We present CHARAGRAM embeddings, a simple approach for learning character-based compositional models to embed textual sequences. A word or sentence is represented using a character n-gram count vector, followed by a single nonlinear transformation to yield a low-dimensional embedding. We use three tasks for evaluation: word similarity, sentence similarity, and part-of-speech tagging. We demonstrate that CHARAGRAM embeddings outperform more complex architectures based on character-level recurrent and convolutional neural networks, achieving new state-of-the-art performance on several similarity tasks.

show abstract

Section: Related Workmentioning

confidence: 99%

Charagram: Embedding Words and Sentences via Character n-grams

Wieting

Bansal

Gimpel

et al. 2016

Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

139

118

View full text Add to dashboard Cite

show abstract

“…Generally, given a document and a fixed number of classes, the classification model has to predict the class that is most relevant to that document. Several recent studies have discovered that character-based representation provides straightforward and powerful models for relation extraction [28], sentiment classification [33], and transition based parsing [5]. Lodhi et.…”

Section: S:2 Connecting To Previous Studiesmentioning

confidence: 99%

“…Several recent studies have discovered that character-based representation provides straightforward and powerful models for relation extraction [28], sentiment classification [33], and transition based parsing [5]. We downloaded the processed WebKB datasets (removed stop/short words, stemming, etc.)…”

Section: S:32 19 Datasets Used In Evaluationsmentioning

confidence: 99%

GaKCo: A Fast Gapped k-mer String Kernel Using Counting

Singh

Sekhon

Kowsari

et al. 2017

Machine Learning and Knowledge Discovery in Databases

View full text Add to dashboard Cite

Abstract. String Kernel (SK) techniques, especially those using gapped k-mers as features (gk), have obtained great success in classifying sequences like DNA, protein, and text. However, the state-of-the-art gk-SK runs extremely slow when we increase the dictionary size (Σ) or allow more mismatches (M ). This is because current gk-SK uses a trie-based algorithm to calculate cooccurrence of mismatched substrings resulting in a time cost proportional to O(Σ M ). We propose a fast algorithm for calculating Gapped k-mer Kernel using Counting (GaKCo). GaKCo uses associative arrays to calculate the co-occurrence of substrings using cumulative counting. This algorithm is fast, scalable to larger Σ and M , and naturally parallelizable. We provide a rigorous asymptotic analysis that compares GaKCo with the state-of-the-art gk-SK. Theoretically, the time cost of GaKCo is independent of the Σ M term that slows down the trie-based approach. Experimentally, we observe that GaKCo achieves the same accuracy as the state-of-the-art and outperforms its speed by factors of 2, 100, and 4, on classifying sequences of DNA (5 datasets), protein (12 datasets), and character-based English text (2 datasets).

show abstract

“…Based on Dyer et al (2015). Ballesteros et al (2015) further propose to use LSTM to model the relation among characters, leading to better parsing performances on morphology-rich languages. By modeling more history, the parser gives significant better accuracies compared to the greedy neural parser of Chen and Manning.…”

Section: Parsing By Neural Networkmentioning

confidence: 99%

A Neural Probabilistic Structured-Prediction Method for Transition-Based Natural Language Processing

Zhou

Zhang

Cheng

et al. 2017

jair

View full text Add to dashboard Cite

We propose a neural probabilistic structured-prediction method for transition-based natural language processing, which integrates beam search and contrastive learning. The method uses a global optimization model, which can leverage arbitrary features over nonlocal context. Beam search is used for efficient heuristic decoding, and contrastive learning is performed for adjusting the model according to search errors. When evaluated on both chunking and dependency parsing tasks, the proposed method achieves significant accuracy improvements over the locally normalized greedy baseline on the two tasks, respectively.

show abstract

Improved Transition-based Parsing by Modeling Characters instead of Words with LSTMs

Cited by 224 publications

References 22 publications

Charagram: Embedding Words and Sentences via Character n-grams

Charagram: Embedding Words and Sentences via Character n-grams

GaKCo: A Fast Gapped k-mer String Kernel Using Counting

A Neural Probabilistic Structured-Prediction Method for Transition-Based Natural Language Processing

Contact Info

Product

Resources

About