Conversational Analysis Using Utterance-level Attention-based Bidirectional Recurrent Neural Networks

Bothe, Chandrakant; Magg, Sven; Weber, Cornelius; Wermter, Stefan

doi:10.21437/interspeech.2018-2527

Cited by 20 publications

(40 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Bi-directional and Multi-layer Recurrent Models. In addition to our baseline recurrent models (LSTM and GRU), we also test their bi-directional and multi-layer variants, both of which have previously been explored within DA classification studies (Kumar et al 2017;Bothe et al 2018a;Chen et al 2018;Ribeiro et al 2019). The bi-directional models (Bi-LSTM and Bi-GRU) process the input sequence in the forwards and then backwards directions.…”

Section: Supervised Model Variantsmentioning

confidence: 99%

“…In some cases, the contextual information may also include 'future' utterances or DA labels, in other words, those that appear after the current utterance requiring classification; though, the utility of such future information for real-time applications such as dialogue systems is questionable. Within DA classification research, it has been widely shown that including such contextual information yields improved performance over single-sentence approaches (Lee and Dernoncourt 2016;Liu and Lane 2017;Bothe et al 2018a;Ribeiro et al 2019). The advantage of including contextual information is clear when considering the nature of dialogue as a sequence of utterances.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Sentence encoding for Dialogue Act classification

2021

View full text Add to dashboard Cite

In this study, we investigate the process of generating single-sentence representations for the purpose of Dialogue Act (DA) classification, including several aspects of text pre-processing and input representation which are often overlooked or underreported within the literature, for example, the number of words to keep in the vocabulary or input sequences. We assess each of these with respect to two DA-labelled corpora, using a range of supervised models, which represent those most frequently applied to the task. Additionally, we compare context-free word embedding models with that of transfer learning via pre-trained language models, including several based on the transformer architecture, such as Bidirectional Encoder Representations from Transformers (BERT) and XLNET, which have thus far not been widely explored for the DA classification task. Our findings indicate that these text pre-processing considerations do have a statistically significant effect on classification accuracy. Notably, we found that viable input sequence lengths, and vocabulary sizes, can be much smaller than is typically used in DA classification experiments, yielding no significant improvements beyond certain thresholds. We also show that in some cases the contextual sentence representations generated by language models do not reliably outperform supervised methods. Though BERT, and its derivative models, do represent a significant improvement over supervised approaches, and much of the previous work on DA classification.

show abstract

Section: Supervised Model Variantsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Sentence encoding for Dialogue Act classification

2021

View full text Add to dashboard Cite

show abstract

“…Figure 1. Attention mechanism (Bothe et al, 2018) In the context of our work, the attention mechanism gives RNNs a look at the whole sentence, which would not be the case without it. This mechanism helps the network decide which words to give greater weight to, as Bahdanau et al (2014) showed.…”

Section: Attention Mechanismmentioning

confidence: 99%

From an Artificial Neural Network to Teaching

Mughaz

Cohen

Mejahez³

et al. 2020

IJELL

View full text Add to dashboard Cite

Aim/Purpose: Using Artificial Intelligence with Deep Learning (DL) techniques, which mimic the action of the brain, to improve a student’s grammar learning process. Finding the subject of a sentence using DL, and learning, by way of this computer field, to analyze human learning processes and mistakes. In addition, showing Artificial Intelligence learning processes, with and without a general overview of the problem that it is under examination. Applying the idea of the general perspective that the network gets on the sentences and deriving recommendations from this for teaching processes. Background: We looked for common patterns of computer errors and human grammar mistakes. Also deducing the neural network’s learning process, deriving conclusions, and applying concepts from this process to the process of human learning. Methodology: We used DL technologies and research methods. After analysis, we built models from three types of complex neuronal networks – LSTM, Bi-LSTM, and GRU – with sequence-to-sequence architecture. After this, we combined the sequence-to- sequence architecture model with the attention mechanism that gives a general overview of the input that the network receives. Contribution: The cost of computer applications is cheaper than that of manual human effort, and the availability of a computer program is much greater than that of humans to perform the same task. Thus, using computer applications, we can get many desired examples of mistakes without having to pay humans to perform the same task. Understanding the mistakes of the machine can help us to under-stand the human mistakes, because the human brain is the model of the artificial neural network. This way, we can facilitate the student learning process by teaching students not to make mistakes that we have seen made by the artificial neural network. We hope that with the method we have developed, it will be easier for teachers to discover common mistakes in students’ work before starting to teach them. In addition, we show that a “general explanation” of the issue under study can help the teaching and learning process. Findings: We performed the test case on the Hebrew language. From the mistakes we received from the computerized neuronal networks model we built, we were able to classify common human errors. That is, we were able to find a correspondence between machine mistakes and student mistakes. Recommendations for Practitioners: Use an artificial neural network to discover mistakes, and teach students not to make those mistakes. We recommend that before the teacher begins teaching a new topic, he or she gives a general explanation of the problems this topic deals with, and how to solve them. Recommendations for Researchers: To use machines that simulate the learning processes of the human brain, and study if we can thus learn about human learning processes. Impact on Society: When the computer makes the same mistakes as a human would, it is very easy to learn from those mistakes and improve the study process. The fact that ma-chine and humans make similar mistakes is a valuable insight, especially in the field of education, Since we can generate and analyze computer system errors instead of doing a survey of humans (who make mistakes similar to those of the machine); the teaching process becomes cheaper and more efficient. Future Research: We plan to create an automatic grammar-mistakes maker (for instance, by giving the artificial neural network only a tiny data-set to learn from) and ask the students to correct the errors made. In this way, the students will practice on the material in a focused manner. We plan to apply these techniques to other education subfields and, also, to non-educational fields. As far as we know, this is the first study to go in this direction ‒ instead of looking at organisms and building machines, to look at machines and learn about organisms.

show abstract

“…Following the majority of previous works, we produce frame-level predictions in this work. As attention mechanism has achieved excellent performance in various tasks like emotion recognition [10], conversational analysis [11], and speech recognition [12], we assume it can also benefit AED as it helps Liang He is the corresponding author. the model focus on when target events take place.…”

Section: Introductionmentioning

confidence: 99%

Multi-Scale Time-Frequency Attention for Acoustic Event Detection

Zhang

Ding

Kang³

et al. 2019

Interspeech 2019

View full text Add to dashboard Cite

Most attention-based methods only concentrate along the time axis, which is insufficient for Acoustic Event Detection (AED). Meanwhile, previous methods for AED rarely considered that target events possess distinct temporal and frequential scales. In this work, we propose a Multi-Scale Time-Frequency Attention (MTFA) module for AED. MTFA gathers information at multiple resolutions to generate a time-frequency attention mask which tells the model where to focus along both time and frequency axis. With MTFA, the model could capture the characteristics of target events with different scales. We demonstrate the proposed method on Task 2 of Detection and Classification of Acoustic Scenes and Events (DCASE) 2017 Challenge. Our method achieves competitive results on both development dataset and evaluation dataset.

show abstract

Conversational Analysis Using Utterance-level Attention-based Bidirectional Recurrent Neural Networks

Cited by 20 publications

References 25 publications

Sentence encoding for Dialogue Act classification

Sentence encoding for Dialogue Act classification

From an Artificial Neural Network to Teaching

Multi-Scale Time-Frequency Attention for Acoustic Event Detection

Contact Info

Product

Resources

About