End-to-End Spoken Language Understanding: Bootstrapping in Low Resource Scenarios

Bhosale, Swapnil; Sheikh, Imran; Dumpala, Sri Harsha; Kopparapu, Sunil Kumar

doi:10.21437/interspeech.2019-2366

Cited by 32 publications

(23 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Go and colleagues [ 11 ] showed that the POS tags caused reduced performance, although POS tags can be strong indicators of emotions in text and serve as a helpful feature in opinion or sentiment analysis [ 18 ]. Moreover, bootstrapping approaches, which rely on a seed list of opinion or emotion words to find other such words in a large corpus, are becoming more popular and have proven effective [ 20 , 21 , 22 , 23 ]. Mihalcea, Banea, and Wiebe [ 23 ] described two types of methods for bootstrapping the subjectivity lexicons into dictionary-based and corpus-based.…”

Section: Introductionmentioning

confidence: 99%

Modeling Spatiotemporal Pattern of Depressive Symptoms Caused by COVID-19 Using Social Media Data Mining

Chaudhary

Zhang

2020

IJERPH

View full text Add to dashboard Cite

By 29 May 2020, the coronavirus disease (COVID-19) caused by SARS-CoV-2 had spread to 188 countries, infecting more than 5.9 million people, and causing 361,249 deaths. Governments issued travel restrictions, gatherings of institutions were cancelled, and citizens were ordered to socially distance themselves in an effort to limit the spread of the virus. Fear of being infected by the virus and panic over job losses and missed education opportunities have increased people’s stress levels. Psychological studies using traditional surveys are time-consuming and contain cognitive and sampling biases, and therefore cannot be used to build large datasets for a real-time depression analysis. In this article, we propose a CorExQ9 algorithm that integrates a Correlation Explanation (CorEx) learning algorithm and clinical Patient Health Questionnaire (PHQ) lexicon to detect COVID-19 related stress symptoms at a spatiotemporal scale in the United States. The proposed algorithm overcomes the common limitations of traditional topic detection models and minimizes the ambiguity that is caused by human interventions in social media data mining. The results show a strong correlation between stress symptoms and the number of increased COVID-19 cases for major U.S. cities such as Chicago, San Francisco, Seattle, New York, and Miami. The results also show that people’s risk perception is sensitive to the release of COVID-19 related public news and media messages. Between January and March, fear of infection and unpredictability of the virus caused widespread panic and people began stockpiling supplies, but later in April, concerns shifted as financial worries in western and eastern coastal areas of the U.S. left people uncertain of the long-term effects of COVID-19 on their lives.

show abstract

Section: Introductionmentioning

confidence: 99%

Modeling Spatiotemporal Pattern of Depressive Symptoms Caused by COVID-19 Using Social Media Data Mining

Chaudhary

Zhang

2020

IJERPH

View full text Add to dashboard Cite

show abstract

“…Using unaligned data provides the flexibility to infer slot labels from imperfect transcriptions. Hence, in this work, the NLU module was a seq2seq attention-based model 1 .…”

Section: Pipeline Slumentioning

confidence: 99%

“…of the pipeline approach [1,2]. The main motivation for applying E2E SLU is that word by word recognition is not necessary to infer slots and intents and that the ASR phoneme dictionary and language model (LM) become optional.…”

Section: Introductionmentioning

confidence: 99%

SLU for Voice Command in Smart Home: Comparison of Pipeline and End-to-End Approaches

Desot

Portet

Vacher

2019

2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)

View full text Add to dashboard Cite

HAL is a multidisciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L'archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d'enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

show abstract

“…[7,11] address this problem using a curriculum and transfer learning approach whereby the model is gradually trained on increasingly relevant data until it is fine-tuned on the actual domain data. Similarly, [5,12] advocate pre-training an ASR model on a large amount of transcribed speech data to initialize a speech-to-intent model that is then trained on a much smaller training set with both transcripts and intent labels.…”

Section: Introductionmentioning

confidence: 99%

Leveraging Unpaired Text Data for Training End-To-End Speech-to-Intent Systems

Huang

Kuo

Thomas

et al. 2020

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

Training an end-to-end (E2E) neural network speech-to-intent (S2I) system that directly extracts intents from speech requires large amounts of intent-labeled speech data, which is time consuming and expensive to collect. Initializing the S2I model with an ASR model trained on copious speech data can alleviate data sparsity. In this paper, we attempt to leverage NLU text resources. We implemented a CTC-based S2I system that matches the performance of a state-ofthe-art, traditional cascaded SLU system. We performed controlled experiments with varying amounts of speech and text training data. When only a tenth of the original data is available, intent classification accuracy degrades by 7.6% absolute. Assuming we have additional text-to-intent data (without speech) available, we investigated two techniques to improve the S2I system: (1) transfer learning, in which acoustic embeddings for intent classification are tied to fine-tuned BERT text embeddings; and (2) data augmentation, in which the textto-intent data is converted into speech-to-intent data using a multispeaker text-to-speech system. The proposed approaches recover 80% of performance lost due to using limited intent-labeled speech.

show abstract

End-to-End Spoken Language Understanding: Bootstrapping in Low Resource Scenarios

Cited by 32 publications

References 17 publications

Modeling Spatiotemporal Pattern of Depressive Symptoms Caused by COVID-19 Using Social Media Data Mining

Modeling Spatiotemporal Pattern of Depressive Symptoms Caused by COVID-19 Using Social Media Data Mining

SLU for Voice Command in Smart Home: Comparison of Pipeline and End-to-End Approaches

Leveraging Unpaired Text Data for Training End-To-End Speech-to-Intent Systems

Contact Info

Product

Resources

About