Ayushman Dash scite author profile

Ayushman Dash

4Publications

25Citation Statements Received

88Citation Statements Given

How they've been cited

How they cite others

Affiliations

University of Kaiserslautern

Publications

Order By: Most citations

Subword Semantic Hashing for Intent Classification on Small Datasets

Shridhar

Dash

Sahu

et al. 2019

View full text Add to dashboard Cite

In this paper, we introduce the use of Semantic Hashing as embedding for the task of Intent Classification and achieve state-of-the-art performance on three frequently used benchmarks. Intent Classification on a small dataset is a challenging task for data-hungry state-of-the-art Deep Learning based systems. Semantic Hashing is an attempt to overcome such a challenge and learn robust text classification. Current word embedding based methods [11], [13], [14] are dependent on vocabularies. One of the major drawbacks of such methods is out-of-vocabulary terms, especially when having small training datasets and using a wider vocabulary. This is the case in Intent Classification for chatbots, where typically small datasets are extracted from internet communication. Two problems arise with the use of internet communication. First, such datasets miss a lot of terms in the vocabulary to use word embeddings efficiently. Second, users frequently make spelling errors. Typically, the models for intent classification are not trained with spelling errors and it is difficult to think about ways in which users will make mistakes. Models depending on a word vocabulary will always face such issues. An ideal classifier should handle spelling errors inherently. With Semantic Hashing, we overcome these challenges and achieve state-of-the-art results on three datasets: Chatbot, Ask Ubuntu, and Web Applications [3]. Our benchmarks are available online. 1

show abstract

Indic-Transformers: An Analysis of Transformer Language Models for Indian Languages

Jain¹,

Deshpande²,

Shridhar³

et al. 2020

Preprint

View full text Add to dashboard Cite

Language models based on the Transformer architecture [1] have achieved state-ofthe-art performance on a wide range of natural language processing (NLP) tasks such as text classification, question-answering, and token classification. However, this performance is usually tested and reported on high-resource languages, like English, French, Spanish, and German. Indian languages, on the other hand, are underrepresented in such benchmarks. Despite some Indian languages being included in training multilingual Transformer models, they have not been the primary focus of such work. In order to evaluate the performance on Indian languages specifically, we analyze these language models through extensive experiments on multiple downstream tasks in Hindi, Bengali, and Telugu language. Here, we compare the efficacy of fine-tuning model parameters of pre-trained models against that of training a language model from scratch. Moreover, we empirically argue against the strict dependency between the dataset size and model performance, but rather encourage task-specific model and method selection. We achieve state-of-the-art performance on Hindi and Bengali languages for text classification task. Finally, we present effective strategies for handling the modeling of Indian languages and we release our model checkpoints for the community : https://huggingface.co/neuralspace-reverie. * Equal contribution Preprint. Under review.

show abstract

AirScript - Creating Documents in Air

Dash

Sahu

Shringi

et al. 2017

View full text Add to dashboard Cite

This paper presents a novel approach, called AirScript, for creating, recognizing and visualizing documents in air. We present a novel algorithm, called 2-DifViz, that converts the hand movements in air (captured by a Myo-armband worn by a user) into a sequence of x, y coordinates on a 2D Cartesian plane, and visualizes them on a canvas. Existing sensor-based approaches either do not provide visual feedback or represent the recognized characters using prefixed templates. In contrast, AirScript stands out by giving freedom of movement to the user, as well as by providing a real-time visual feedback of the written characters, making the interaction natural. AirScript provides a recognition module to predict the content of the document created in air. To do so, we present a novel approach based on deep learning, which uses the sensor data and the visualizations created by 2-DifViz. The recognition module consists of a Convolutional Neural Network (CNN) and two Gated Recurrent Unit (GRU) Networks. The output from these three networks is fused to get the final prediction about the characters written in air. AirScript can be used in highly sophisticated environments like a smart classroom, a smart factory or a smart laboratory, where it would enable people to annotate pieces of texts wherever they want without any reference surface. We have evaluated AirScript against various well-known learning models (HMM, KNN, SVM, etc.) on the data of 12 participants. Evaluation results show that the recognition module of AirScript largely outperforms all of these models by achieving an accuracy of 91.7% in a person independent evaluation and a 96.7% accuracy in a person dependent evaluation.

show abstract

AirScript - Creating Documents in Air

Dash¹,

Sahu²,

Shringi³

et al. 2017

Preprint

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Ayushman Dash

Subword Semantic Hashing for Intent Classification on Small Datasets

Indic-Transformers: An Analysis of Transformer Language Models for Indian Languages

AirScript - Creating Documents in Air

AirScript - Creating Documents in Air

Contact Info

Product

Resources

About