Devang Naik scite author profile

A new approach is proposed for the clustering of words in a given vocabulary. The method is based on a paradigm first, formulated in the context, of information retrieval, called latent semuntac unulysis. This paradigm leads to a parsimonious vector representation of each word in a suitable vector space, where familiar clustering techniques can be applied. The distance measure selected in this space arises naturally from the problem formulation. Preliminary experiments indicate that the clusters produced are intuitively satisfactory. Because these clusters are semantic in nature, this approach may prove useful as a complement, to conventional class-based statistical language modeling techniques.

show abstract

Multi-Task Learning for Speaker Verification and Voice Trigger Detection

Sigtia

Marchi

Kajarekar

et al. 2020

View full text Add to dashboard Cite

Automatic speech transcription and speaker recognition are usually treated as separate tasks even though they are interdependent. In this study, we investigate training a single network to perform both tasks jointly. We train the network in a supervised multi-task learning setup, where the speech transcription branch of the network is trained to minimise a phonetic connectionist temporal classification (CTC) loss while the speaker recognition branch of the network is trained to label the input sequence with the correct label for the speaker. We present a large-scale empirical study where the model is trained using several thousand hours of labelled training data for each task. We evaluate the speech transcription branch of the network on a voice trigger detection task while the speaker recognition branch is evaluated on a speaker verification task. Results demonstrate that the network is able to encode both phonetic and speaker information in its learnt representations while yielding accuracies at least as good as the baseline models for each task, with the same number of parameters as the independent models.

show abstract

Neural Text-to-Speech Adaptation from Low Quality Public Recordings

Hu¹,

Marchi²,

Winarsky³

et al. 2019

View full text Add to dashboard Cite

Leveraging Acoustic Cues and Paralinguistic Embeddings to Detect Expression from Voice

Mitra¹,

Booker²,

Marchi³

et al. 2019

View full text Add to dashboard Cite

Millions of people reach out to digital assistants such as Siri every day, asking for information, making phone calls, seeking assistance, and much more. The expectation is that such assistants should understand the intent of the user's query. Detecting the intent of a query from a short, isolated utterance is a difficult task. Intent cannot always be obtained from speechrecognized transcriptions. A transcription-driven approach can interpret what has been said but fails to acknowledge how it has been said, and as a consequence, may ignore the expression present in the voice. Our work investigates whether a system can reliably detect vocal expression in queries using acoustic and paralinguistic embedding. Results show that the proposed method offers a relative equal error rate (EER) decrease of 60% compared to a bag-of-word based system, corroborating that expression is significantly represented by vocal attributes, rather than being purely lexical. Addition of emotion embedding helped to reduce the EER by 30% relative to the acoustic embedding, demonstrating the relevance of emotion in expressive voice.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Devang Naik

Meta-neural networks that learn by learning

A novel word clustering algorithm based on latent semantic analysis

Multi-Task Learning for Speaker Verification and Voice Trigger Detection

Neural Text-to-Speech Adaptation from Low Quality Public Recordings

Leveraging Acoustic Cues and Paralinguistic Embeddings to Detect Expression from Voice

Contact Info

Product

Resources

About