2019 IEEE International Conference on Systems, Man and Cybernetics (SMC) 2019
DOI: 10.1109/smc.2019.8914542
|View full text |Cite
|
Sign up to set email alerts
|

A Deep Learning Model with Data Enrichment for Intent Detection and Slot Filling

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
6
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(6 citation statements)
references
References 14 publications
0
6
0
Order By: Relevance
“…Vector representation of frame [Ray et al 2018] Rare, OOV words Paraphrasing input utterances [Liu et al 2019b] Unidirectional information flow Memory network [Shen et al 2019a] Poor generalisation in deployment Sparse word embedding (prune useless words) [Ray et al 2019] Slots which take many values perform poorly Delexicalisation Language knowledge base, history context Attention over external knowledge base, multiturn history Implicit knowledge sharing between tasks BiLSTM, multi-task (DA) [Gupta et al 2019a] Speed Non-recurrent and label recurrent networks [Gupta et al 2019b] Multi-turn dialogue, using context Token attention, previous history Capturing intent-slot correlation Multi-head self attention, masked intent Poor generalisation BERT [Bhasin et al 2019] Learning joint distribution CNN, BiLSTM, cross-fusion, masking [Thi Do and Gaspers 2019] Lack of annotated data, flexibility Language transfer, multitasking, modularisation ] Key verb-slot correlation Key verb in features, BiLSTM, attention [Zhang and Wang 2019] Learning joint distribution Transformer architecture [Daha and Hewavitharana 2019] Efficient modelling of temporal dependency Character embedding and RNN [Dadas et al 2019] Lack of annotated data, small data sets Augmented data set Learning joint distribution Word embedding attention [E et al 2019] Learning joint distribution Bidirectional architecture, feedback Poor generalisation BERT encoding, multi-head self attention [Qin et al 2019] Weak influence of intent on slot Use intent prediction instead of summarised intent info in slot tagging [Gangadharaiah and Narayanaswamy 2019] Multi-intent samples Multi-label classification methods [Firdaus et al 2019] Multi-turn dialogue history, learning joint distribution RNN, CRF [Pentyala et al 2019] Optimal architecture BiLSTM, different architectures Non-recurrent model, transfer learning BERT, language transfer [Schuster et al 2019] Low resource languages Transfer methods with SLU test case [Okur et al 2019] Natural language Locate intent keywords, non-other slots [Xu et al 2020] Only good performance in one sub-task Joint intent/slot tagging, length variable attention [Bhasin et al 2020] Learning joint distribution Multimodal Low-rank Bilinear Attention Network [Firdaus et al 2020] Learning joint distribution Stacked BiLSTM [Zhang et al 2020b] Limitations of sequential analysis Graph representation of text Non-convex optimisation Convex combination of ensemble of models BERT issues with logical d...…”
Section: Hierarchical Modelsmentioning
confidence: 99%
See 2 more Smart Citations
“…Vector representation of frame [Ray et al 2018] Rare, OOV words Paraphrasing input utterances [Liu et al 2019b] Unidirectional information flow Memory network [Shen et al 2019a] Poor generalisation in deployment Sparse word embedding (prune useless words) [Ray et al 2019] Slots which take many values perform poorly Delexicalisation Language knowledge base, history context Attention over external knowledge base, multiturn history Implicit knowledge sharing between tasks BiLSTM, multi-task (DA) [Gupta et al 2019a] Speed Non-recurrent and label recurrent networks [Gupta et al 2019b] Multi-turn dialogue, using context Token attention, previous history Capturing intent-slot correlation Multi-head self attention, masked intent Poor generalisation BERT [Bhasin et al 2019] Learning joint distribution CNN, BiLSTM, cross-fusion, masking [Thi Do and Gaspers 2019] Lack of annotated data, flexibility Language transfer, multitasking, modularisation ] Key verb-slot correlation Key verb in features, BiLSTM, attention [Zhang and Wang 2019] Learning joint distribution Transformer architecture [Daha and Hewavitharana 2019] Efficient modelling of temporal dependency Character embedding and RNN [Dadas et al 2019] Lack of annotated data, small data sets Augmented data set Learning joint distribution Word embedding attention [E et al 2019] Learning joint distribution Bidirectional architecture, feedback Poor generalisation BERT encoding, multi-head self attention [Qin et al 2019] Weak influence of intent on slot Use intent prediction instead of summarised intent info in slot tagging [Gangadharaiah and Narayanaswamy 2019] Multi-intent samples Multi-label classification methods [Firdaus et al 2019] Multi-turn dialogue history, learning joint distribution RNN, CRF [Pentyala et al 2019] Optimal architecture BiLSTM, different architectures Non-recurrent model, transfer learning BERT, language transfer [Schuster et al 2019] Low resource languages Transfer methods with SLU test case [Okur et al 2019] Natural language Locate intent keywords, non-other slots [Xu et al 2020] Only good performance in one sub-task Joint intent/slot tagging, length variable attention [Bhasin et al 2020] Learning joint distribution Multimodal Low-rank Bilinear Attention Network [Firdaus et al 2020] Learning joint distribution Stacked BiLSTM [Zhang et al 2020b] Limitations of sequential analysis Graph representation of text Non-convex optimisation Convex combination of ensemble of models BERT issues with logical d...…”
Section: Hierarchical Modelsmentioning
confidence: 99%
“…The gamut of word embedding methods have been used including word2vec ( [Pan et al 2018;Wang et al 2018c]), fastText ([Firdaus et al 2020]), GloVe ([Bhasin et al 2019;Bhasin et al 2020;Dadas et al 2019;Liu et al 2019b;Okur et al 2019;Pentyala et al 2019;Thi Do and Gaspers 2019;Zhang and Wang 2016]), ELMo [Zhang et al 2020b] and [Krone et al 2020] (pre-print only), BERT ([Ni et al 2020;Qin et al 2019;] and Krone et al 2020] (pre-print only) and [Han et al 2020] (submitted for publication). [Firdaus et al 2018a] and [Firdaus et al 2019] used concatenated GloVe and word2vec embeddings to capture more word information.…”
Section: Token Embeddingmentioning
confidence: 99%
See 1 more Smart Citation
“…Also the shear number of algorithms show the varying performance for benchmarks. S lawomir Dadas [9] measured the accuracy for original data, then the data is expanded for 50%, 100% and 200%. This provides a platform to randomly mu-tate the data, for better learning.…”
Section: Robertomentioning
confidence: 99%
“…But building a taxonomy with thousands of words becomes a very difficult and expensive task. Also, it does not show much dif-ference when it comes to more general topics and more specific topics S. Dadas [9] used a compound neural architecture which uses a layered approach on the ATIS dataset. But in this approach, accuracy decreases because of duplication in sentences which is caused by high probabilities of words.…”
Section: IIImentioning
confidence: 99%