Ruhi Sarikaya scite author profile

We describe a joint model for intent detection and slot filling based on convolutional neural networks (CNN). The proposed architecture can be perceived as a neural network (NN) version of the triangular CRF model (TriCRF), in which the intent label and the slot sequence are modeled jointly and their dependencies are exploited. Our slot filling component is a globally normalized CRF style model, as opposed to left-toright models in recent NN based slot taggers. Its features are automatically extracted through CNN layers and shared by the intent model. We show that our slot model component generates state-of-the-art results, outperforming CRF significantly. Our joint model outperforms the standard TriCRF by 1% absolute for both intent and slot. On a number of other domains, our joint model achieves 0.7 -1%, and 0.9 -2.1% absolute gains over the independent modeling approach for intent and slot respectively.

show abstract

Application of Deep Belief Networks for Natural Language Understanding

Sarikaya

Hinton

Deoras

2014

IEEE/ACM Trans. Audio Speech Lang. Process.

404

155

View full text Add to dashboard Cite

Applications of deep belief nets (DBN) to various problems have been the subject of a number of recent studies ranging from image classification and speech recognition to audio classification. In this study we apply DBNs to a natural language understanding problem. The recent surge of activity in this area was largely spurred by the development of a greedy layer-wise pretraining method that uses an efficient learning algorithm called contrastive divergence (CD). CD allows DBNs to learn a multi-layer generative model from unlabeled data and the features discovered by this model are then used to initialize a feed-forward neural network which is fine-tuned with backpropagation. We compare a DBN-initialized neural network to three widely used text classification algorithms: support vector machines (SVM), boosting and maximum entropy (MaxEnt). The plain DBN-based model gives a call-routing classification accuracy that is equal to the best of the other models. However, using additional unlabeled data for DBN pretraining and combining DBN-based learned features with the original features provides significant gains over SVMs, which, in turn, performed better than both MaxEnt and Boosting.

show abstract

The Technology Behind Personal Digital Assistants: An overview of the system architecture and key components

Sarikaya

2017

IEEE Signal Process. Mag.

149

102

View full text Add to dashboard Cite

Cross-Lingual Transfer Learning for POS Tagging without Cross-Lingual Resources

Kim¹,

Kim²,

Sarikaya³

et al. 2017

121

View full text Add to dashboard Cite

Training a POS tagging model with crosslingual transfer learning usually requires linguistic knowledge and resources about the relation between the source language and the target language. In this paper, we introduce a cross-lingual transfer learning model for POS tagging without ancillary resources such as parallel corpora. The proposed cross-lingual model utilizes a common BLSTM that enables knowledge transfer from other languages, and private BLSTMs for language-specific representations. The cross-lingual model is trained with language-adversarial training and bidirectional language modeling as auxiliary objectives to better represent language-general information while not losing the information about a specific target language. Evaluating on POS datasets from 14 languages in the Universal Dependencies corpus, we show that the proposed transfer learning model improves the POS tagging performance of the target languages without exploiting any linguistic knowledge between the source language and the target language.

show abstract

Maximum entropy based restoration of Arabic diacritics

Zitouni

Sorensen

Sarikaya

2006

View full text Add to dashboard Cite

Short vowels and other diacritics are not part of written Arabic scripts. Exceptions are made for important political and religious texts and in scripts for beginning students of Arabic. Script without diacritics have considerable ambiguity because many words with different diacritic patterns appear identical in a diacritic-less setting. We propose in this paper a maximum entropy approach for restoring diacritics in a document. The approach can easily integrate and make effective use of diverse types of information; the model we propose integrates a wide array of lexical, segmentbased and part-of-speech tag features. The combination of these feature types leads to a state-of-the-art diacritization model. Using a publicly available corpus (LDC's Arabic Treebank Part 3), we achieve a diacritic error rate of 5.1%, a segment error rate 8.5%, and a word error rate of 17.3%. In case-ending-less setting, we obtain a diacritic error rate of 2.2%, a segment error rate 4.0%, and a word error rate of 7.2%.

show abstract

Deep belief nets for natural language call-routing

Sarikaya¹,

Hinton

Ramabhadran³

2011

113

View full text Add to dashboard Cite

New Transfer Learning Techniques for Disparate Label Sets

Kim¹,

Stratos²,

Sarikaya³

et al. 2015

View full text Add to dashboard Cite

In natural language understanding (NLU), a user utterance can be labeled differently depending on the domain or application (e.g., weather vs. calendar). Standard domain adaptation techniques are not directly applicable to take advantage of the existing annotations because they assume that the label set is invariant. We propose a solution based on label embeddings induced from canonical correlation analysis (CCA) that reduces the problem to a standard domain adaptation task and allows use of a number of transfer learning techniques. We also introduce a new transfer learning technique based on pretraining of hidden-unit CRFs (HUCRFs). We perform extensive experiments on slot tagging on eight personal digital assistant domains and demonstrate that the proposed methods are superior to strong baselines.

show abstract

Contextual domain classification in spoken language understanding systems using recurrent neural network

Sarikaya

2014

View full text Add to dashboard Cite

In a multi-domain, multi-turn spoken language understanding session, information from the history often greatly reduces the ambiguity of the current turn. In this paper, we apply the recurrent neural network (RNN) to exploit contextual information for query domain classification. The Jordan-type RNN directly sends the vector of output distribution to the next query turn as additional input features to the convolutional neural network (CNN). We evaluate our approach against SVM with and without contextual features. On our contextually labeled dataset, we observe a 1.4% absolute (8.3% relative) improvement in classification error rate over the non-contextual SVM, and 0.9% absolute (5.5% relative) improvement over the contextual SVM.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Ruhi Sarikaya

Convolutional neural network based triangular CRF for joint intent detection and slot filling

Application of Deep Belief Networks for Natural Language Understanding

The Technology Behind Personal Digital Assistants: An overview of the system architecture and key components

Cross-Lingual Transfer Learning for POS Tagging without Cross-Lingual Resources

Maximum entropy based restoration of Arabic diacritics

Deep belief nets for natural language call-routing

New Transfer Learning Techniques for Disparate Label Sets

Contextual domain classification in spoken language understanding systems using recurrent neural network

Contact Info

Product

Resources

About