Abstract Text Summarization with a Convolutional Seq2seq Model

Zhang, Yong; Li, Dan; Wang, Yuheng; Fang, Yang; Xiao, Weidong

doi:10.3390/app9081665

Cited by 47 publications

(22 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The sequence to sequence model (Seq2Seq) has been widely used in processing tasks of variable length input and output sequences, including speech recognition, machine translation and so on [27][28] [29]. Its core idea is to map a variable length input sequence to variable length output sequence using cyclic neural network.…”

Section: B Seq2seq Modelmentioning

confidence: 99%

ST-Seq2Seq: A Spatio-Temporal Feature-Optimized Seq2Seq Model for Short-Term Vessel Trajectory Prediction

et al. 2020

View full text Add to dashboard Cite

Deep learning provides appropriate mechanisms to predict vessel trajectories for safer and efficient shipping, but still existing models are mainly oriented to longer-term prediction trends and do not fully support real time navigation needs. While most recent works have been largely exploiting Automatic Identification System (AIS), the complete semantics of these data haven't so far fully exploited. The research presented in this paper introduced an extended sequence-to-sequence model using AIS data. A Gated Recurrent Unit (GRU) network encodes historical spatio-temporal sequences as a context vector, which not only preserves the sequential relationships among trajectory locations, but also alleviates the gradient descent problem. The GRU network acts as a decoder, outputting target trajectory location sequences. Real AIS data from the Chongqing and Wuhan sections of the Yangzi River were selected as typical experimental areas for evaluation purposes. The proposed ST-Seq2Seq model has been tested against the LSTM-RNN and GRU-RNN baseline models for short term trajectory prediction experiments. A 10-minute historical trajectory sequence was used to predict the trajectory sequence for the next five minutes. Overall, the findings show that LSTM and GRU networks, while applying a recursive method to predict a sequence of continuous trajectory points, when the number of predicted trajectory points increases accuracy decreases. Conversely, the extended sequence-to-sequence model shows satisfactory stability on different ship channels.

show abstract

Section: B Seq2seq Modelmentioning

confidence: 99%

ST-Seq2Seq: A Spatio-Temporal Feature-Optimized Seq2Seq Model for Short-Term Vessel Trajectory Prediction

et al. 2020

View full text Add to dashboard Cite

show abstract

“…The seq2seq framework introduced by Google was initially applied to NMT tasks [13,25]. Later, in the field of NLP, seq2seq models were also used for text summarization [26], parsing [27], or generative chatbots (as presented in Section 2). These models can address the challenge of a variable input and output length.…”

Section: Seq2seq Modelsmentioning

confidence: 99%

A Domain-Specific Generative Chatbot Trained from Little Data

Kapočiūtė-Dzikienė

2020

Applied Sciences

View full text Add to dashboard Cite

Accurate generative chatbots are usually trained on large datasets of question–answer pairs. Despite such datasets not existing for some languages, it does not reduce the need for companies to have chatbot technology in their websites. However, companies usually own small domain-specific datasets (at least in the form of an FAQ) about their products, services, or used technologies. In this research, we seek effective solutions to create generative seq2seq-based chatbots from very small data. Since experiments are carried out in English and morphologically complex Lithuanian languages, we have an opportunity to compare results for languages with very different characteristics. We experimentally explore three encoder–decoder LSTM-based approaches (simple LSTM, stacked LSTM, and BiLSTM), three word embedding types (one-hot encoding, fastText, and BERT embeddings), and five encoder–decoder architectures based on different encoder and decoder vectorization units. Furthermore, all offered approaches are applied to the pre-processed datasets with removed and separated punctuation. The experimental investigation revealed the advantages of the stacked LSTM and BiLSTM encoder architectures and BERT embedding vectorization (especially for the encoder). The best achieved BLUE on English/Lithuanian datasets with removed and separated punctuation was ~0.513/~0.505 and ~0.488/~0.439, respectively. Better results were achieved with the English language, because generating different inflection forms for the morphologically complex Lithuanian is a harder task. The BLUE scores fell into the range defining the quality of the generated answers as good or very good for both languages. This research was performed with very small datasets having little variety in covered topics, which makes this research not only more difficult, but also more interesting. Moreover, to our knowledge, it is the first attempt to train generative chatbots for a morphologically complex language.

show abstract

“…Zhou et al [9] proposed a joint learning model for sentence scoring and selection to lead the two tasks interact simultaneously, and multiple layer perceptron (MLP) is introduced to score sentences according to both the previously selected sentences and remains. Zhang et al [3] developed a hierarchical convolution model with an attention mechanism to extract keywords and key sentences simultaneously, and a copy mechanism was incorporated to resolve the problem of out of vocabulary (OOV) [22]. In addition, reinforcement learning (RL) has been proven to be effective in improving the performance of the summarization system [12,23] by allowing directly maximize the measure metric of summary quality, such as the ROUGE score between the generated summary and the ground truth.…”

Section: Related Workmentioning

confidence: 99%

“…Automatic summarization systems have been made great progress in many applications, such as headline generation [1], single or multi-document summarization [2,3], opinion mining [4], text categorization, etc. The system aims to shorten the input and retain the salient information from the source document.…”

Section: Introductionmentioning

confidence: 99%

Comprehensive Document Summarization with Refined Self-Matching Mechanism

Zeng

Yang

et al. 2020

Applied Sciences

View full text Add to dashboard Cite

Under the constraint of memory capacity of the neural network and the document length, it is difficult to generate summaries with adequate salient information. In this work, the self-matching mechanism is incorporated into the extractive summarization system at the encoder side, which allows the encoder to optimize the encoding information at the global level and effectively improves the memory capacity of conventional LSTM. Inspired by human coarse-to-fine understanding mode, localness is modeled by Gaussian bias to improve contextualization for each sentence, and merged into the self-matching energy. The refined self-matching mechanism not only establishes global document attention but perceives association with neighboring signals. At the decoder side, the pointer network is utilized to perform a two-hop attention on context and extraction state. Evaluations on the CNN/Daily Mail dataset verify that the proposed model outperforms the strong baseline models and statistical significantly.

show abstract

Abstract Text Summarization with a Convolutional Seq2seq Model

Cited by 47 publications

References 24 publications

ST-Seq2Seq: A Spatio-Temporal Feature-Optimized Seq2Seq Model for Short-Term Vessel Trajectory Prediction

ST-Seq2Seq: A Spatio-Temporal Feature-Optimized Seq2Seq Model for Short-Term Vessel Trajectory Prediction

A Domain-Specific Generative Chatbot Trained from Little Data

Comprehensive Document Summarization with Refined Self-Matching Mechanism

Contact Info

Product

Resources

About