Lexico-Acoustic Neural-Based Models for Dialog Act Classification

Ortega, Daniel; Vu, Ngoc Thang

doi:10.1109/icassp.2018.8461371

Cited by 17 publications

(20 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…One of the main differences between MTs and ATs is that the latter has no punctuation. In [7], it was shown that punctuation provides strong lexical cues. Therefore, we retrained the model on MRDA's MTs without punctuation.…”

Section: Experiments On Automatic Transcriptionsmentioning

confidence: 99%

“…Automatic DA classification is a crucial preprocessing step for language understanding and dialog systems. This task has been approached using traditional statistical algorithms, for instance hidden Markov models (HMMs) [3], conditional random fields (CRFs) [4], and more recently deep learning (DL) models, such as convolutional neural networks (CNNs) [5], recurrent neural networks (RNNs) [6,7] and attention mechanism (AM) [8,7], achieve state-of-the-art results.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Context-aware Neural-based Dialog Act Classification on Automatically Generated Transcriptions

Ortega

Vallejo

et al. 2019

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Self Cite

View full text Add to dashboard Cite

This paper presents our latest investigations on dialog act (DA) classification on automatically generated transcriptions. We propose a novel approach that combines convolutional neural networks (CNNs) and conditional random fields (CRFs) for context modeling in DA classification. We explore the impact of transcriptions generated from different automatic speech recognition systems such as hybrid TDNN/HMM and End-to-End systems on the final performance. Experimental results on two benchmark datasets (MRDA and SwDA) show that the combination CNN and CRF improves consistently the accuracy. Furthermore, they show that although the word error rates are comparable, End-to-End ASR system seems to be more suitable for DA classification.

show abstract

Section: Experiments On Automatic Transcriptionsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Context-aware Neural-based Dialog Act Classification on Automatically Generated Transcriptions

Ortega

Vallejo

et al. 2019

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Self Cite

View full text Add to dashboard Cite

show abstract

“…Some authors found that considering the context explicitly in RNN models helps dialog act classification (Ortega and Vu, 2017;Liu et al, 2017a;Kumar et al, 2018;Raheja and Tetreault, 2019;Dai et al, 2020). Also, it has been shown that incorporating acoustic/prosodic features helps as well to some extent (Ortega and Vu, 2018;Si et al, 2020). Colombo et al (2020) report the best result to date for SWDA classification-an accuracy of 85%, obtained by a sequence-to-sequence (seq2seq) GRU model with guided attention.…”

Section: Dialog Act Classificationmentioning

confidence: 99%

“…The Switchboard annotators originally used the DAMSL labeling scheme (Core and Allen, 1997) with 220 dialog acts and clustered them after annotation into a reduced label set. There seems to be no consensus on the reduced label set size-some of the studies using a 42-label set are Quarteroni et al (2011); Liu et al (2017a); Ortega and Vu (2018); Kumar et al (2018), others use a 43-label set (Ortega and Vu, 2017;Raheja and Tetreault, 2019;Zhao and Kawahara, 2019;.…”

Section: Switchboard Dialog Actmentioning

confidence: 99%

What Helps Transformers Recognize Conversational Structure? Importance of Context, Punctuation, and Labels in Dialog Act Recognition

Żelasko

Pappagari

Dehak

2021

Transactions of the Association for Computational Linguistics

View full text Add to dashboard Cite

Dialog acts can be interpreted as the atomic units of a conversation, more fine-grained than utterances, characterized by a specific communicative function. The ability to structure a conversational transcript as a sequence of dialog acts—dialog act recognition, including the segmentation—is critical for understanding dialog. We apply two pre-trained transformer models, XLNet and Longformer, to this task in English and achieve strong results on Switchboard Dialog Act and Meeting Recorder Dialog Act corpora with dialog act segmentation error rates (DSER) of 8.4% and 14.2%. To understand the key factors affecting dialog act recognition, we perform a comparative analysis of models trained under different conditions. We find that the inclusion of a broader conversational context helps disambiguate many dialog act classes, especially those infrequent in the training data. The presence of punctuation in the transcripts has a massive effect on the models’ performance, and a detailed analysis reveals specific segmentation patterns observed in its absence. Finally, we find that the label set specificity does not affect dialog act segmentation performance. These findings have significant practical implications for spoken language understanding applications that depend heavily on a good-quality segmentation being available.

show abstract

“…Tang et al worked on question detection by using 65 LLDs with an RNN-based model [4]. Ortega and Vu classified dialog acts by combining lexical features and 13-dimensional Mel-frequency cepstrum coefficients (MFCC), and their result explored that the acoustic features are helpful to recognize questions [5]. Arsikere et al proposed a number of new statistical acoustic features for dialog act classification [6].…”

Section: Introductionmentioning

confidence: 99%

CNN-BLSTM Based Question Detection from Dialogs Considering Phase and Context Information

Wang

Dang

et al. 2019

Interspeech 2019

View full text Add to dashboard Cite

Question detection from dialogs is important in humancomputer interaction systems. Recent studies on question detection mostly use recurrent neural network (RNN) based methods to process low-level descriptors (LLD) of the utterance. However, there are three main problems in these studies. Firstly, traditional LLD features are defined based on human a priori knowledge, some of which are difficult to be extracted accurately. Secondly, previous studies of question detection only consider features from amplitude information and ignored phase information. Thirdly, previous studies show that the context in an utterance is helpful to detect question, while the context between utterances is not well investigated in this task. To cope with the aforementioned problems, we propose a CNN-BLSTM based framework, where amplitude information is obtained from the combination of spectrogram and LLD, and processed together with the phase information. Our framework also models the context information in the dialog. From the experiments on Mandarin dialog corpus, we revealed the effectiveness of the integrated feature with both amplitude and phase in question detection. The results indicated that the phase feature was helpful to detect the questions with a short duration, and the context between utterances was beneficial to detect questions without special interrogative forms.

show abstract

Lexico-Acoustic Neural-Based Models for Dialog Act Classification

Cited by 17 publications

References 18 publications

Context-aware Neural-based Dialog Act Classification on Automatically Generated Transcriptions

Context-aware Neural-based Dialog Act Classification on Automatically Generated Transcriptions

What Helps Transformers Recognize Conversational Structure? Importance of Context, Punctuation, and Labels in Dialog Act Recognition

CNN-BLSTM Based Question Detection from Dialogs Considering Phase and Context Information

Contact Info

Product

Resources

About