Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing 2015
DOI: 10.18653/v1/d15-1180
|View full text |Cite
|
Sign up to set email alerts
|

Molding CNNs for text: non-linear, non-consecutive convolutions

Abstract: The success of deep learning often derives from well-chosen operational building blocks. In this work, we revise the temporal convolution operation in CNNs to better adapt it to text processing. Instead of concatenating word representations, we appeal to tensor algebra and use low-rank n-gram tensors to directly exploit interactions between words already at the convolution stage. Moreover, we extend the n-gram convolution to non-consecutive words to recognize patterns with intervening words. Through a combinat… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
72
0

Year Published

2016
2016
2020
2020

Publication Types

Select...
5
4

Relationship

1
8

Authors

Journals

citations
Cited by 89 publications
(72 citation statements)
references
References 24 publications
0
72
0
Order By: Relevance
“…One area of research involves incorporating word-level convolutions (i.e. n-gram filters) into recurrent computation (Lei et al, 2015;Bradbury et al, 2017;Lei et al, 2017). For example, Quasi-RNN (Bradbury et al, 2017) proposes to alternate convolutions and a minimalist recurrent pooling function and achieves significant speed-up over LSTM.…”
Section: Related Workmentioning
confidence: 99%
“…One area of research involves incorporating word-level convolutions (i.e. n-gram filters) into recurrent computation (Lei et al, 2015;Bradbury et al, 2017;Lei et al, 2017). For example, Quasi-RNN (Bradbury et al, 2017) proposes to alternate convolutions and a minimalist recurrent pooling function and achieves significant speed-up over LSTM.…”
Section: Related Workmentioning
confidence: 99%
“…layer1 layer2 layer3 layer4 layer5 layer6 layer7 layer8 layer9 layer10 layer11 layer12 layer13 layer14 layer15 Table 1 presents the test-set accuracies obtained by different strategies. Results in Table 1 indicate that the AGT method achieved very competitive accuracy (with 50.5%), when compared to the state-of-the-art results obtained by the tree-LSTM (51.0%) (Tai et al, 2015;Zhu et al, 2015) and high-order CNN approaches (51.2%) (Lei et al, 2015).…”
Section: Resultsmentioning
confidence: 89%
“…We learned 15 layers with 200 dimensions each, which requires us to project the 300 dimensional word vectors; we implemented this using a linear transformation, whose weight matrix and bias term are shared across all words, followed by a tanh activation. For optimization, we used Adadelta (Zeiler, 2012), with learning rate of 0.0005, mini-batch of 50, transform gate bias of 1, and dropout (Srivastava et al, 2014) (Lei et al, 2015). layer1 layer2 layer3 layer4 layer5 layer6 layer7 layer8 layer9 layer10 layer11 layer12 layer13 layer14 layer15 Table 1 presents the test-set accuracies obtained by different strategies.…”
Section: Resultsmentioning
confidence: 99%
“…(2) t is used as output for onward computation. Different strategies to computing λ t were explored (Lei et al, 2015(Lei et al, , 2016. When λ t is a constant, or depends only on x t , e.g., λ t = σ(W λ v t +b λ ), the ith dimension of Equations 14…”
Section: More Than Two Statesmentioning
confidence: 99%