Association Between Physical Activity Practice and Clustering of Health Risk Behaviors in Adolescents

In the last years, the task of Query-by-Example Spoken Term Detection (QbE-STD), which aims to find occurrences of a spoken query in a set of audio documents, has gained the interest of the research community for its versatility in settings where untranscribed, multilingual and acoustically unconstrained spoken resources, or spoken resources in low-resource languages, must be searched. This paper describes and reports experimental results for a QbE-STD system that achieved the best performance in the recent Spoken Web Search (SWS) evaluation, held as part of MediaEval 2013. Though not optimized for speed, the system operates faster than real-time. The system exploits high-performance phone decoders to extract framelevel phone posteriors (a common representation in QbE-STD tasks). Then, given a query and a audio document, a distance matrix is computed between their phone posterior representations, followed by a newly introduced distance normalization technique and an iterative Dynamic Time Warping (DTW) matching procedure with some heuristic prunings. Results show that remarkable performance improvements can be achieved by using multiple examples per query and, specially, through the late (score-level) fusion of different subsystems, each based on a different set of phone posteriors.

show abstract

Computing consensus translation from multiple machine translation systems

Bangalore

Bordel

Riccardi

View full text Add to dashboard Cite

On the use of phone log-likelihood ratios as features in spoken language recognition

Díez

Varona

Peñagarikano

et al. 2012

View full text Add to dashboard Cite

Feature Selection Based on Genetic Algorithms for Speaker Recognition

Zamalloa

Bordel

Rodríguez

et al. 2006

View full text Add to dashboard Cite

The 2013 speaker recognition evaluation in mobile environment

Khoury

Vesnicer

Franco-Pedroso

et al. 2013

View full text Add to dashboard Cite

El acceso a la versión del editor puede requerir la suscripción del recurso Access to the published version may require subscription AbstractThis paper evaluates the performance of the twelve primary systems submitted to the evaluation on speaker verification in the context of a mobile environment using the MOBIO database. The mobile environment provides a challenging and realistic test-bed for current state-of-the-art speaker verification techniques. Results in terms of equal error rate (EER), half total error rate (HTER) and detection error trade-off (DET) confirm that the best performing systems are based on total variability modeling, and are the fusion of several sub-systems. Nevertheless, the good old UBM-GMM based systems are still competitive. The results also show that the use of additional data for training as well as gender-dependent features can be helpful.

show abstract

Probabilistic Kernels for Improved Text-to-Speech Alignment in Long Audio Tracks

Bordel

Peñagarikano

Rodríguez-Fuentes

et al. 2016

IEEE Signal Process. Lett.

View full text Add to dashboard Cite

KALAKA-3: a database for the assessment of spoken language recognition technology on YouTube audios

Rodríguez-Fuentes

Peñagarikano

Varona

et al. 2015

Lang Resources & Evaluation

View full text Add to dashboard Cite

Using cross-decoder phone coocurrences in phonotactic language recognition

Peñagarikano

Varona

Rodríguez-Fuentes

et al. 2010

View full text Add to dashboard Cite

Phonotactic language recognizers are based on the ability of phone decoders to produce phone sequences containing acoustic, phonetic and phonological information, which is partially dependent on the language. Input utterances are decoded and then scored by means of models for the target languages. Commonly, various decoders are applied in parallel and fused at the score level. A kind of complementarity effect is expected when fusing scores, since each decoder is assumed to extract different (and complementary) information from the input utterance. This assumption is supported by the performance improvements attained when fusing systems. However, decodings are processed in a fully uncoupled way, their time alignment (and the information that may be extracted from it) being completely lost. In this paper, a simple approach is proposed, which takes into account time alignment information, by considering cross-decoder phone coocurrences at the frame level. To evaluate the approach, a choice of open software (BUT front-end and phone decoders, SRI-LM toolkit, libSVM, FoCal) is used, and experiments are carried out on the NIST LRE2007 database. Adding phone coocurrences to the baseline phonotactic systems provides slight performance improvements, revealing the potential benefit of using cross-decoder dependencies for language modeling.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Germán Bordel

High-performance Query-by-Example Spoken Term Detection on the SWS 2013 evaluation

Computing consensus translation from multiple machine translation systems

On the use of phone log-likelihood ratios as features in spoken language recognition

Feature Selection Based on Genetic Algorithms for Speaker Recognition

The 2013 speaker recognition evaluation in mobile environment

Probabilistic Kernels for Improved Text-to-Speech Alignment in Long Audio Tracks

KALAKA-3: a database for the assessment of spoken language recognition technology on YouTube audios

Using cross-decoder phone coocurrences in phonotactic language recognition

Contact Info

Product

Resources

About