Self-Attentional Models for Lattice Inputs

Sperber, Matthias; Neubig, Graham; Pham, Ngoc-Quan; Waibel, Alex

doi:10.18653/v1/p19-1115

Cited by 37 publications

(29 citation statements)

References 27 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…While the lattice RNNs perform similar to the GNN-based FTM, we found that they are often inconvenient due to increased training time as compared to the GNNs (∼8min/epoch for RNN training v/s ∼1.5min/epoch for GNN training in our experiments). As observed in [13], unique connections in lattices inhibit efficient batching of training examples during RNN training. On the other hand, in GNNs, recurrent computations are replaced by graph convolution operations and multiple lattices of different sizes and structures can be efficiently batched together using zero padding resulting in substantial training speed-up.…”

Section: Results and Analysismentioning

confidence: 99%

“…Self-attention graph neural networks (SAGNN) [14,13,11,12] model the relationship between the nodes of a graph using the self-attention mechanism instead of the predefined edges in the graph. Instead of using a fixed adjacency matrix as in GCNs, this approach uses the inner-product of the feature vectors of the lattice arcs to compute their relevance to each other.…”

Section: Self-attention Based Graph Neural Networkmentioning

confidence: 99%

“…Lattice-based FTM has been successfully explored in [7,8] and has shown that confusion in ASR lattices acts as a strong evidence of false alarm. Prior work on GNNs has been comprehensively summarized in [9,10,11,12]; and [13] demonstrates use of self-attention based GNNs on lattice inputs for a machine translation task.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Lattice-Based Improvements for Voice Triggering Using Graph Neural Networks

Dighe

Adya

et al. 2020

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

Voice-triggered smart assistants often rely on detection of a triggerphrase before they start listening for the user request. Mitigation of false triggers is an important aspect of building a privacy-centric non-intrusive smart assistant. In this paper, we address the task of false trigger mitigation (FTM) using a novel approach based on analyzing automatic speech recognition (ASR) lattices using graph neural networks (GNN). The proposed approach uses the fact that decoding lattice of a falsely triggered audio exhibits uncertainties in terms of many alternative paths and unexpected words on the lattice arcs as compared to the lattice of a correctly triggered audio. A pure trigger-phrase detector model doesn't fully utilize the intent of the user speech whereas by using the complete decoding lattice of user audio, we can effectively mitigate speech not intended for the smart assistant. We deploy two variants of GNNs in this paper based on 1) graph convolution layers and 2) self-attention mechanism respectively. Our experiments demonstrate that GNNs are highly accurate in FTM task by mitigating ∼87% of false triggers at 99% true positive rate (TPR). Furthermore, the proposed models are fast to train and efficient in parameter requirements.

show abstract

Section: Results and Analysismentioning

confidence: 99%

Section: Self-attention Based Graph Neural Networkmentioning

confidence: 99%

See 1 more Smart Citation

Lattice-Based Improvements for Voice Triggering Using Graph Neural Networks

Dighe

Adya

et al. 2020

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

show abstract

“…Mihaylov and Frank (2019) proposed a discourse-aware selfattention encoder for reading comprehension on narrative texts, where event chains, discourse relations and coreference relations are used for connecting sentences. Self-attention can be also extended to 2d-dimensions for image processing (Parmar et al, 2018) and lattice inputs (Sperber et al, 2019).…”

Section: Self-attention Mechanismmentioning

confidence: 99%

Discourse Self-Attention for Discourse Element Identification in Argumentative Student Essays

Song¹,

Song²,

Fu³

et al. 2020

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

View full text Add to dashboard Cite

This paper proposes to adapt self-attention to discourse level for modeling discourse elements in argumentative student essays. Specifically, we focus on two issues. First, we propose structural sentence positional encodings to explicitly represent sentence positions. Second, we propose to use inter-sentence attentions to capture sentence interactions and enhance sentence representation. We conduct experiments on two datasets: a Chinese dataset and an English dataset. We find that (i) sentence positional encodings can lead to a large improvement for identifying discourse elements; (ii) a structural relative positional encoding of sentences shows to be most effective; (iii) inter-sentence attention vectors are useful as a kind of sentence representation for identifying discourse elements.

show abstract

“…In the cascade approach, an ASR system transcribes the input speech signal, and this is fed to a downstream MT system that carries out the translation. The provided input to the MT step can be the 1-best hypothesis, but also n-best lists (Ng et al, 2016) or even lattices (Matusov and Ney, 2011;Sperber et al, 2019). Additional techniques can also be used to improve the performance of the pipeline by better adapting the MT system to the expected input, such as training with transcribed text (Peitz et al, 2012) or chunking (Sperber et al, 2017).…”

Section: Introductionmentioning

confidence: 99%

Direct Segmentation Models for Streaming Speech Translation

Iranzo-Sánchez¹,

Pastor²,

Silvestre-Cerdà³

et al. 2020

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

View full text Add to dashboard Cite

The cascade approach to Speech Translation (ST) is based on a pipeline that concatenates an Automatic Speech Recognition (ASR) system followed by a Machine Translation (MT) system. These systems are usually connected by a segmenter that splits the ASR output into, hopefully, semantically self-contained chunks to be fed into the MT system. This is specially challenging in the case of streaming ST, where latency requirements must also be taken into account. This work proposes novel segmentation models for streaming ST that incorporate not only textual, but also acoustic information to decide when the ASR output is split into a chunk. An extensive and thorough experimental setup is carried out on the Europarl-ST dataset to prove the contribution of acoustic information to the performance of the segmentation model in terms of BLEU score in a streaming ST scenario. Finally, comparative results with previous work also show the superiority of the segmentation models proposed in this work.

show abstract

Self-Attentional Models for Lattice Inputs

Cited by 37 publications

References 27 publications

Lattice-Based Improvements for Voice Triggering Using Graph Neural Networks

Lattice-Based Improvements for Voice Triggering Using Graph Neural Networks

Discourse Self-Attention for Discourse Element Identification in Argumentative Student Essays

Direct Segmentation Models for Streaming Speech Translation

Contact Info

Product

Resources

About