Ignacio López-Moreno scite author profile

Esta es la versión de autor de la comunicación de congreso publicada en: This is an author produced version of a paper published in: ABSTRACTThis work studies the use of deep neural networks (DNNs) to address automatic language identification (LID). Motivated by their recent success in acoustic modelling, we adapt DNNs to the problem of identifying the language of a given spoken utterance from short-term acoustic features. The proposed approach is compared to state-of-the-art i-vector based acoustic systems on two different datasets: Google 5M LID corpus and NIST LRE 2009. Results show how LID can largely benefit from using DNNs, especially when a large amount of training data is available. We found relative improvements up to 70%, in C avg , over the baseline system.

show abstract

Improving DNN speaker independence with I-vector inputs

Senior

López-Moreno

2014

179

110

View full text Add to dashboard Cite

VoiceFilter-Lite: Streaming Targeted Voice Separation for On-Device Speech Recognition

Wang

López-Moreno

Sağlam

et al. 2020

View full text Add to dashboard Cite

Locally-connected and convolutional neural networks for small footprint speaker recognition

Chen¹,

López-Moreno²,

Sainath³

et al. 2015

View full text Add to dashboard Cite

Automatic language identification using long short-term memory recurrent neural networks

Gónzalez-Domínguez

López-Moreno

Sak

et al. 2014

View full text Add to dashboard Cite

This work explores the use of Long Short-Term Memory (LSTM) recurrent neural networks (RNNs) for automatic language identification (LID). The use of RNNs is motivated by their better ability in modeling sequences with respect to feed forward networks used in previous works. We show that LSTM RNNs can effectively exploit temporal dependencies in acoustic data, learning relevant features for language discrimination purposes. The proposed approach is compared to baseline i-vector and feed forward Deep Neural Network (DNN) systems in the NIST Language Recognition Evaluation 2009 dataset. We show LSTM RNNs achieve better performance than our best DNN system with an order of magnitude fewer parameters. Further, the combination of the different systems leads to significant performance improvements (up to 28%).

show abstract

Frame-by-frame language identification in short utterances using deep neural networks

et al. 2015

View full text Add to dashboard Cite

Esta es la versión de autor del artículo publicado en: This is an author produced version of a paper published in:Neural Networks 64 (2015) AbstractThis work addresses the use of deep neural networks (DNNs) in automatic language identification (LID) focused on short test utterances. Motivated by their recent success in acoustic modelling for speech recognition, we adapt DNNs to the problem of identifying the language in a given utterance from the short-term acoustic features. We show how DNNs are particularly suitable to perform LID in real-time applications, due to their capacity to emit a language identification posterior at each new frame of the test utterance. We then analyse different aspects of the system, such as the amount of required training data, the number of hidden layers, the relevance of contextual information and the effect of the test utterance duration. Finally, we propose several methods to combine frame-by-frame posteriors. Experiments are conducted on two different datasets: the public NIST Language Recognition Evaluation 2009 (3 seconds task) and a much larger corpus (of 5 million utterances) known as Google 5M LID, obtained from different Google Services. Reported results show relative improvements of DNNs versus the i-vector system of 40% in LRE09 3 second task and 76% in Google 5M LID.

show abstract

On the use of deep feedforward neural networks for automatic language identification

López-Moreno

Gónzalez-Domínguez

Martínez³

et al. 2016

Computer Speech & Language

View full text Add to dashboard Cite

A Real-Time End-to-End Multilingual Speech Recognition Architecture

Gónzalez-Domínguez

Eustis

López-Moreno

et al. 2015

IEEE J. Sel. Top. Signal Process.

View full text Add to dashboard Cite

Automatic speech recognition (ASR) systems are used daily by millions of people worldwide to dictate messages, control devices, initiate searches or to facilitate data input in small devices. The user experience in these scenarios depends on the quality of the speech transcriptions and on the responsiveness of the system. For multilingual users, a further obstacle to natural interaction is the monolingual character of many ASR systems, in which users are constrained to a single preset language. In this work, we present an end-to-end multi-language ASR architecture, developed and deployed at Google, that allows users to select arbitrary combinations of spoken languages. We leverage recent advances in language identification and a novel method of realtime language selection to achieve similar recognition accuracy and nearly-identical latency characteristics as a monolingual system.

show abstract

12 3 4

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.