2019
DOI: 10.48550/arxiv.1904.04733
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Seq2Biseq: Bidirectional Output-wise Recurrent Neural Networks for Sequence Modelling

Abstract: During the last couple of years, Recurrent Neural Networks (RNN) have reached state-of-the-art performances on most of the sequence modelling problems. In particular, the sequence to sequence model and the neural CRF have proved to be very effective in this domain. In this article, we propose a new RNN architecture for sequence labelling, leveraging gated recurrent layers to take arbitrarily long contexts into account, and using two decoders operating forward and backward. We compare several variants of the pr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
2
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(3 citation statements)
references
References 38 publications
0
2
0
Order By: Relevance
“…To evaluate our method on large language models, we conducted experiments on all linear layers of OPT-6.7B ) with a full sequence length of 2048, including the last lm head layer. As shown in Table 3, our method achieved an acceleration rate of approximately ×4.64 and produced the most lossless results on wikitext2 (Merity et al 2016), ptb (Dinarelli and Grobol 2019), and c4 (Raffel et al 2019). Our acceleration rate is better than the current stateof-the-art methods such as GPTQ (Frantar et al 2022) and sparseGPT (Frantar and Alistarh 2023), while our method does not have an advantage in parameter storage.…”
Section: Experiments On Optmentioning
confidence: 91%
“…To evaluate our method on large language models, we conducted experiments on all linear layers of OPT-6.7B ) with a full sequence length of 2048, including the last lm head layer. As shown in Table 3, our method achieved an acceleration rate of approximately ×4.64 and produced the most lossless results on wikitext2 (Merity et al 2016), ptb (Dinarelli and Grobol 2019), and c4 (Raffel et al 2019). Our acceleration rate is better than the current stateof-the-art methods such as GPTQ (Frantar et al 2022) and sparseGPT (Frantar and Alistarh 2023), while our method does not have an advantage in parameter storage.…”
Section: Experiments On Optmentioning
confidence: 91%
“…PTB. The Penn Treebank (Marcus et al, 1993), in particular the sections of the corpus corresponding to the articles of Wall Street Journal (WSJ), is a standard dataset for language modeling (Mikolov et al, 2012) and sequence labeling (Dinarelli and Grobol, 2019). Following the setting in Shen et al (2021), we use the preprocessing method proposed in Mikolov et al (2012).…”
Section: D1 Masked Language Modelingmentioning
confidence: 99%
“…These gating mechanisms allow such RNN variants to be trained to keep necessary and relevant information for longer period from previous states, or discard information with less importance from previous states [5,6]. Recent works have shown the importance of RNNs with gating mechanisms in achieving improved results for classification and generation tasks with sequence modelling [2,10,35]. Recurrent neural networks' inherent capability of adequately modelling sequential data supplemented by the advantages of gated features in GRUs enables them to effectively model tasks that use short-term or long-term video sequences.…”
Section: Standard Grumentioning
confidence: 99%