Jorge Llombart scite author profile

Jorge Llombart

5Publications

41Citation Statements Received

166Citation Statements Given

How they've been cited

How they cite others

139

163

Affiliations

Universidad de Zaragoza

Publications

Order By: Most citations

Progressive Speech Enhancement with Residual Connections

Llombart

Ribas

Miguel

et al. 2019

View full text Add to dashboard Cite

This paper studies the Speech Enhancement based on Deep Neural Networks. The proposed architecture gradually follows the signal transformation during enhancement by means of a visualization probe at each network block. Alongside the process, the enhancement performance is visually inspected and evaluated in terms of regression cost. This progressive scheme is based on Residual Networks. During the process, we investigate a residual connection with a constant number of channels, including internal state between blocks, and adding progressive supervision. The insights provided by the interpretation of the network enhancement process leads us to design an improved architecture for the enhancement purpose. Following this strategy, we are able to obtain speech enhancement results beyond the state-of-the-art, achieving a favorable trade-off between dereverberation and the amount of spectral distortion.

show abstract

Phonetically-Aware Embeddings, Wide Residual Networks with Time-Delay Neural Networks and Self Attention Models for the 2018 NIST Speaker Recognition Evaluation

Viñals

Ribas

Mingote

et al. 2019

View full text Add to dashboard Cite

ALBAYZIN 2016 spoken term detection evaluation: an international open competitive evaluation in Spanish

Tejedor

Toledano

López-Otero

et al. 2017

J AUDIO SPEECH MUSIC PROC.

View full text Add to dashboard Cite

Within search-on-speech, Spoken Term Detection (STD) aims to retrieve data from a speech repository given a textual representation of a search term. This paper presents an international open evaluation for search-on-speech based on STD in Spanish and an analysis of the results. The evaluation has been designed carefully so that several analyses of the main results can be carried out. The evaluation consists in retrieving the speech files that contain the search terms, providing their start and end times, and a score value that reflects the confidence given to the detection. Two different Spanish speech databases have been employed in the evaluation: MAVIR database, which comprises a set of talks from workshops, and EPIC database, which comprises a set of European Parliament sessions in Spanish. We present the evaluation itself, both databases, the evaluation metric, the systems submitted to the evaluation, the results, and a detailed discussion. Five different research groups took part in the evaluation, and ten different systems were submitted in total. We compare the systems submitted to the evaluation and make a deep analysis based on some search term properties (term length, within-vocabulary/out-of-vocabulary terms, single-word/ multi-word terms, and native (Spanish)/foreign terms).

show abstract

Tied Hidden Factors in Neural Networks for End-to-End Speaker Recognition

Miguel

Llombart

Giménez

et al. 2017

View full text Add to dashboard Cite

In this paper we propose a method to model speaker and session variability and able to generate likelihood ratios using neural networks in an end-to-end phrase dependent speaker verification system. As in Joint Factor Analysis, the model uses tied hidden variables to model speaker and session variability and a MAP adaptation of some of the parameters of the model. In the training procedure our method jointly estimates the network parameters and the values of the speaker and channel hidden variables. This is done in a two-step backpropagation algorithm, first the network weights and factor loading matrices are updated and then the hidden variables, whose gradients are calculated by aggregating the corresponding speaker or session frames, since these hidden variables are tied. The last layer of the network is defined as a linear regression probabilistic model whose inputs are the previous layer outputs. This choice has the advantage that it produces likelihoods and additionally it can be adapted during the enrolment using MAP without the need of a gradient optimization. The decisions are made based on the ratio of the output likelihoods of two neural network models, speaker adapted and universal background model. The method was evaluated on the RSR2015 database. 1

show abstract

Speech Enhancement with Wide Residual Networks in Reverberant Environments

Llombart

Ribas

Miguel

et al. 2019

View full text Add to dashboard Cite

This paper proposes a speech enhancement method which exploits the high potential of residual connections in a Wide Residual Network architecture. This is supported on single dimensional convolutions computed alongside the time domain, which is a powerful approach to process contextually correlated representations through the temporal domain, such as speech feature sequences. We find the residual mechanism extremely useful for the enhancement task since the signal always has a linear shortcut and the non-linear path enhances it in several steps by adding or subtracting corrections. The enhancement capability of the proposal is assessed by objective quality metrics evaluated with simulated and real samples of reverberated speech signals. Results show that the proposal outperforms the state-of-the-art method called WPE, which is known to effectively reduce reverberation and greatly enhance the signal. The proposed model, trained with artificial synthesized reverberation data, was able to generalize to real room impulse responses for a variety of conditions (e.g. different room sizes, RT60, near & far field). Furthermore, it achieves accuracy for real speech with reverberation from two different datasets.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Jorge Llombart

Progressive Speech Enhancement with Residual Connections

Phonetically-Aware Embeddings, Wide Residual Networks with Time-Delay Neural Networks and Self Attention Models for the 2018 NIST Speaker Recognition Evaluation

ALBAYZIN 2016 spoken term detection evaluation: an international open competitive evaluation in Spanish

Tied Hidden Factors in Neural Networks for End-to-End Speaker Recognition

Speech Enhancement with Wide Residual Networks in Reverberant Environments

Contact Info

Product

Resources

About