Sequence-to-sequence models have been applied to the conversation response generation problem where the source sequence is the conversation history and the target sequence is the response. Unlike translation, conversation responding is inherently creative. The generation of long, informative, coherent, and diverse responses remains a hard task. In this work, we focus on the single turn setting. We add self-attention to the decoder to maintain coherence in longer responses, and we propose a practical approach, called the glimpse-model, for scaling to large datasets. We introduce a stochastic beam-search algorithm with segment-by-segment reranking which lets us inject diversity earlier in the generation process. We trained on a combined data set of over 2.3B conversation messages mined from the web. In human evaluation studies, our method produces longer responses overall, with a higher proportion rated as acceptable and excellent as length increases, compared to baseline sequenceto-sequence models with explicit lengthpromotion. A back-off strategy produces better responses overall, in the full spectrum of lengths.
We introduce a simple wrapper method that uses off-the-shelf word embedding algorithms to learn task-specific bilingual word embeddings. We use a small dictionary of easily-obtainable task-specific word equivalence classes to produce mixed context-target pairs that we use to train off-the-shelf embedding models. Our model has the advantage that it (a) is independent of the choice of embedding algorithm, (b) does not require parallel data, and (c) can be adapted to specific tasks by re-defining the equivalence classes. We show how our method outperforms off-the-shelf bilingual embeddings on the task of unsupervised cross-language partof-speech (POS) tagging, as well as on the task of semi-supervised cross-language super sense (SuS) tagging.
Image translation refers to the task of mapping images from a visual domain to another. Given two unpaired collections of images, we aim to learn a mapping between the corpus-level style of each collection, while preserving semantic content shared across the two domains. We introduce xgan, a dual adversarial auto-encoder, which captures a shared representation of the common domain semantic content in an unsupervised way, while jointly learning the domain-to-domain image translations in both directions. We exploit ideas from the domain adaptation literature and define a semantic consistency loss which encourages the learned embedding to preserve semantics shared across domains. We report promising qualitative results for the task of face-to-cartoon translation. The cartoon dataset we collected for this purpose, "CartoonSet", is also publicly available as a new benchmark for semantic style transfer at https://google.github.io/cartoonset/index.html.
Tensor2Tensor is a library for deep learning models that is well-suited for neural machine translation and includes the reference implementation of the state-of-the-art Transformer model. Neural Machine Translation BackgroundMachine translation using deep neural networks achieved great success with sequence-tosequence models (Sutskever et al., 2014;Bahdanau et al., 2014;Cho et al., 2014) that used recurrent neural networks (RNNs) with LSTM cells (Hochreiter and Schmidhuber, 1997). The basic sequence-to-sequence architecture is composed of an RNN encoder which reads the source sentence one token at a time and transforms it into a fixed-sized state vector. This is followed by an RNN decoder, which generates the target sentence, one token at a time, from the state vector.While a pure sequence-to-sequence recurrent neural network can already obtain good translation results (Sutskever et al., 2014;Cho et al., 2014), it suffers from the fact that the whole input sentence needs to be encoded into a single fixed-size vector. This clearly manifests itself in the degradation of translation quality on longer sentences and was partially overcome in Bahdanau et al. (2014) by using a neural model of attention.Convolutional architectures have been used to obtain good results in word-level neural machine translation starting from Kalchbrenner and Blunsom (2013) and later in Meng et al. (2015). These early models used a standard RNN on top of the convolution to generate the output, which creates a bottleneck and hurts performance.Fully convolutional neural machine translation without this bottleneck was first achieved in Kaiser and Bengio (2016) and Kalchbrenner et al. (2016). The Extended Neural GPU model (Kaiser and Bengio, 2016) used a recurrent stack of gated convolutional layers, while the ByteNet model (Kalchbrenner et al., 2016) did away with recursion and used left-padded convolutions in the decoder. This idea, introduced in WaveNet (van den Oord et al., 2016), significantly improves efficiency of the model. The same technique was improved in a number of neural translation models recently, including Gehring et al. (2017) and Kaiser et al. (2017). Self-AttentionInstead of convolutions, one can use stacked self-attention layers. This was introduced in the Transformer model (Vaswani et al., 2017) and has significantly improved state-of-the-art in machine translation and language modeling while also improving the speed of training. Research
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.