Deep neural networks for small footprint text-dependent speaker verification

Variani, Ehsan; Lei, Xin; McDermott, Erik; Moreno, Ignacio López; Gónzalez-Domínguez, Javier

doi:10.1109/icassp.2014.6854363

Cited by 888 publications

(588 citation statements)

References 14 publications

Supporting

Mentioning

580

Contrasting

Unclassified

Order By: Relevance

“…[8][9][10] Meanwhile, it has deep network structure and nonlinear activation function, which makes all kinds of deep learning models be appropriate for big data model, especially for the ones with higher dimensions and which are nonlinear. However, the number of samples is relatively small in spectral analysis, and direct application of deep learning model may result in over¯tting problem.…”

Section: Introductionmentioning

confidence: 99%

Deep belief network-based drug identification using near infrared spectroscopy

Yang

Pan

et al. 2017

J. Innov. Opt. Health Sci.

View full text Add to dashboard Cite

Near infrared spectroscopy (NIRS) analysis technology, combined with chemometrics, can be e®ectively used in quick and nondestructive analysis of quality and category. In this paper, an e®ective drug identi¯cation method by using deep belief network (DBN) with dropout mechanism (dropout-DBN) to model NIRS is introduced, in which dropout is employed to overcome the over¯tting problem coming from the small sample. This paper tests proposed method under datasets of di®erent sizes with the example of near infrared di®use re°ectance spectroscopy of erythromycin ethylsuccinate drugs and other drugs, aluminum and nonaluminum packaged. Meanwhile, it gives experiments to compare the proposed method's performance with back propagation (BP) neural network, support vector machines (SVMs) and sparse denoising autoencoder (SDAE). The results show that for both binary classi¯cation and multi-classi¯cation, dropout mechanism can improve the classi¯cation accuracy, and dropout-DBN can achieve best classi¯cation accuracy in almost all cases. SDAE is similar to dropout-DBN in the aspects of classi¯cation accuracy and algorithm stability, which are higher than that of BP neural network and SVM methods. In terms of training time, dropout-DBN model is superior to SDAE model, but inferior to BP neural network and SVM methods. Therefore, dropout-DBN can be used as a §

show abstract

Section: Introductionmentioning

confidence: 99%

Deep belief network-based drug identification using near infrared spectroscopy

Yang

Pan

et al. 2017

J. Innov. Opt. Health Sci.

View full text Add to dashboard Cite

show abstract

“…However, recent success of Deep Neural Networks in different areas of speech processing (Hinton et al, 2012;Lopez-Moreno et al, 2014) promise for the near future exciting developments in speaker recognition, as those advanced in Vasilakakis, Cumani, and Laface (2013), and Variani, Lei, McDermott, Lopez-Moreno, and Gonzalez-Dominguez (2014).…”

Section: Factor Analysis and I-vectorsmentioning

confidence: 99%

Evaluating Automatic Speaker Recognition systems: An overview of the NIST Speaker Recognition Evaluations (1996-2014)

González-Rodríguez¹

2014

loquens

View full text Add to dashboard Cite

Automatic Speaker Recognition systems show interesting properties, such as speed of processing or repeatability of results, in contrast to speaker recognition by humans. But they will be usable just if they are reliable. Testability, or the ability to extensively evaluate the goodness of the speaker detector decisions, becomes then critical. In the last 20 years, the US National Institute of Standards and Technology (NIST) has organized, providing the proper speech data and evaluation protocols, a series of text-independent Speaker Recognition Evaluations (SRE). Those evaluations have become not just a periodical benchmark test, but also a meeting point of a collaborative community of scientists that have been deeply involved in the cycle of evaluations, allowing tremendous progress in a specially complex task where the speaker information is spread across different information levels (acoustic, prosodic, linguistic…) and is strongly affected by speaker intrinsic and extrinsic variability factors. In this paper, we outline how the evaluations progressively challenged the technology including new speaking conditions and sources of variability, and how the scientific community gave answers to those demands. Finally, NIST SREs will be shown to be not free of inconveniences, and future challenges to speaker recognition assessment will also be discussed. NIST de reconocimiento de locutor (1996NIST de reconocimiento de locutor ( -2014.-Los sistemas automáticos de reconocimiento de locutor son críticos para la organización, etiquetado, gestión y toma de decisiones sobre grandes bases de datos de voces de diferentes locutores. Con el fin de procesar eficientemente tales cantidades de información de voz, necesitamos sistemas muy rápidos y, al no estar libre de errores, lo suficientemente fiables. Los sistemas actuales son órdenes de magnitud más rápidos que tiempo real, permitiendo tomar decisiones automáticas instantáneas sobre enormes cantidades de conversaciones. Pero tal vez la característica más interesante de un sistema automático es la posibilidad de ser analizado en detalle, ya que su rendimiento y fiabilidad puede ser evaluada de manera ciega sobre cantidades enormes de datos en una gran diversidad de condiciones. En los últimos 20 años, el Instituto Nacional de Estándares y Tecnología (NIST) de EE. UU. ha organizado, proporcionando los datos de voz y protocolos de evaluación adecuada, una serie de evaluaciones de reconocimiento de locutor independiente del texto. Esas evaluaciones se han convertido no sólo en una prueba comparativa periódica, sino también en punto de encuentro de una comunidad colaborativa de científicos que han estado profundamente involucrados en el ciclo de evaluaciones, lo que ha permitido un enorme progreso en una tarea especialmente compleja en la que la información individualizadora del locutor se encuentra dispersa en diferentes niveles de información (acústica, prosódica, lingüística...) y está fuertemente afectada por factores de variabilidad intrínsecos y extrínsecos al ...

show abstract

“…Recently, many deep learning methods have been applied in the speech recognition and speaker verification systems [41,[165][166][167], and published results show that speech processing methods driven by MBD and deep learning can obviously improve the performance of the existing speech recognition and speaker verification system [40,168,169]. In the IoV systems, millions of sensors collect abundant vehicles and environmental noises from engines and streets will significantly reduce the accuracy of speech processing system, while the traditional speech enhancement methods, for example, Wiener filtering [170] and minimum mean-square error estimation (MMSE) [171] which focus on advancing signal noise ratio (SNR), do not take full advantage of a priori distribution of noises around vehicles.…”

Section: Speech Recognition and Verification For The Internet Ofmentioning

confidence: 99%

A Survey on Machine Learning-Based Mobile Big Data Analysis: Challenges and Applications

Xie

Song

et al. 2018

Wireless Communications and Mobile Computing

View full text Add to dashboard Cite

This paper attempts to identify the requirement and the development of machine learning-based mobile big data (MBD) analysis through discussing the insights of challenges in the mobile big data. Furthermore, it reviews the state-of-the-art applications of data analysis in the area of MBD. Firstly, we introduce the development of MBD. Secondly, the frequently applied data analysis methods are reviewed. Three typical applications of MBD analysis, namely, wireless channel modeling, human online and offline behavior analysis, and speech recognition in the Internet of Vehicles, are introduced, respectively. Finally, we summarize the main challenges and future development directions of mobile big data analysis.

show abstract

Deep neural networks for small footprint text-dependent speaker verification

Cited by 888 publications

References 14 publications

Deep belief network-based drug identification using near infrared spectroscopy

Deep belief network-based drug identification using near infrared spectroscopy

Evaluating Automatic Speaker Recognition systems: An overview of the NIST Speaker Recognition Evaluations (1996-2014)

A Survey on Machine Learning-Based Mobile Big Data Analysis: Challenges and Applications

Contact Info

Product

Resources

About