Ana Montalvo scite author profile

This paper proposes a novel front-end for automatic spoken language recognition, based on the spectrogram representation of the speech signal and in the properties of the Fourier spectrum to detect global periodicity in an image. Local Phase Quantization (LPQ) texture descriptor was used to capture the spectrogram content. Results obtained for 30 seconds test signal duration have shown that this method is very promising for low cost language identification. The best performance is achieved when our proposed method is fused with the i-vector representation.

show abstract

Selection of the Best Wavelet Packet Nodes Based on Mutual Information for Speaker Identification

Fernández

Montalvo

Calvo

et al. 2008

View full text Add to dashboard Cite

Abstract. The analysis of the speech signal using wavelet packet trees (WPT) is a very flexible tool, capable of effectively manipulate the frequency subbands thanks to the orthonormal bases it provides. Here, dimension reduction becomes very important since the number of subbands grows exponentially with the level of decomposition, and their discriminative relevancy is different, which leads to different resolution for each one. A method based on mutual information is proposed in order to keep as much discriminative information as possible and the less amount of redundant information.

show abstract

Cenatav Voice Group System for Albayzin 2018 Search on Speech Evaluation

Montalvo¹,

Ramı́rez²,

Roble³

et al. 2018

View full text Add to dashboard Cite

This paper presents the system employed in the Albayzin 2018 "Search on Speech" Evaluation by the Voice Group of CENATAV. The system used in the Spoken Term Detection (STD) task consists on an Automatic Speech Recognizer (ASR) and a module to detect the terms. The open source Kaldi toolkit is used to build both modules. ASR acoustic models are based on DNN-HMM, S-GMM or GMM-HMM, trained with audio data provided by the organizers and other obtained from ELDA. The lexicon and trigram language model are obtained from the text associated to the audio. The ASR generates the lattices and the word alignments required to detect the terms. Results with development data shown that DNN-HMM model brings up a behavior better or similar to obtained in previous challenges.

show abstract

Gaussian Segmentation and Tokenization for Low Cost Language Identification

Montalvo

Lara

Hernández-Sierra

2013

View full text Add to dashboard Cite

Most common approaches to phonotactic language recognition deal with phone decoders as tokenizers. However, units that are not linked to phonetic definitions can be more universals, and therefore conceptually easier to adopt. It is assumed that the overall sound characteristics of all spoken languages can be covered by a broad collection of acoustic units, which can be characterized by acoustic segments. In this paper, such acoustic units, highly desirables for a more general language characterization, are delimited and clustered using Gaussian Mixture Model. A new segmentation method on acoustic units of the speech is proposed for later Gaussian modelling, looking for substitute the phonetic recognizer. This tokenizer is trained over untranscribed data, and it precedes the statistical language modeling phase.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Ana Montalvo

A Survey of the Effects of Data Augmentation for Automatic Speech Recognition Systems

Language Identification Using Spectrogram Texture

Selection of the Best Wavelet Packet Nodes Based on Mutual Information for Speaker Identification

Cenatav Voice Group System for Albayzin 2018 Search on Speech Evaluation

Gaussian Segmentation and Tokenization for Low Cost Language Identification

Contact Info

Product

Resources

About