Automatic speech recognition is complementary to language recognition. The language recognition systems exploit this complementarity by using frame-level bottleneck features extracted from neural networks trained with a phone recognition task. Recent methods apply frame-level bottleneck features extracted from an end-to-end sequence-to-sequence speech recognition model. In this work, we study an integrated approach of the training of the speech recognition feature extractor and language recognition modules. We show that for both classical phone recognition and end-to-end sequence-to-sequence features, sequential training of the two modules is not the optimal strategy. The feature extractor can be improved by supervision with the language identification loss, either in a fine-tuning step or in a multi-task training framework. Besides, we notice that end-to-end sequence-to-sequence bottleneck features are on par with classical phone recognition bottleneck features without requiring a forced alignment of the signal with target tokens. However, for sequence-to-sequence, the architecture of the model seems to play an important role; the Conformer architectures leads to much better results than the conventional stacked DNNs approach; and can even be trained directly with the LID module in an end-to-end approach.
State-of-the-art spoken language identification systems are constituted of three modules: a frame-level feature extractor, a segment-level embedding extractor and a final classifier. The performance of these systems degrades when facing mismatch between training and testing data. Most domain adaptation methods focus on adaptation of the final classifier. In this article, we propose a model-based unsupervised domain adaptation of the segment-level embedding extractor. The approach consists in a modification of the loss function used for training the embedding extractor. We introduce a regularization term based on the maximum mean discrepancy loss. Experiments were performed on the RATS corpus with transmission channel mismatch between telephone and radio channels. We obtained the same language identification performance as supervised training on the target domains but without using labeled data from these domains.
State-of-the-art language recognition systems are based on discriminative embeddings called x-vectors. Channel and gender distortions produce mismatch in such x-vector space where embeddings corresponding to the same language are not grouped in an unique cluster. To control this mismatch, we propose to train the x-vector DNN with metric learning objective functions. Combining a classification loss with the metric learning n-pair loss allows to improve the language recognition performance. Such a system achieves a robustness comparable to a system trained with a domain adaptation loss function but without using the domain information. We also analyze the mismatch due to channel and gender, in comparison to language proximity, in the x-vector space. This is achieved using the Maximum Mean Discrepancy divergence measure between groups of x-vectors. Our analysis shows that using the metric learning loss function reduces gender and channel mismatch in the x-vector space, even for languages only observed on one channel in the train set.
International audienceThe Solar irradiance and Earth Radiation Budget (SERB) mission is an innovative proof-of-concept nano-satellite, with three ambitious scientific objectives. The nano-satellite aims at measuring on the same platform the absolute value of the total solar irradiance (TSI) and its variability, the ultraviolet (UV) solar spectral variability, and the different components of the Earth radiation budget. SERB is a joint project between CNES (Centre National d'Etudes Spatiales), Ecole polytechnique, and LATMOS (Laboratoire Atmospheres, Milieux, Observations Spatiales) scheduled for a launch in 2020–2021. It is a three-unit CubeSat (X-CubeSat II), developed by students from ´Ecole polytechnique. Critical components of instrumental payloads of future large missions (coatings, UV filters, etc.) can acquire the technical maturity by flying in a CubeSat. Nano-satellites also represent an excellent alternative for instrumentation testing, allowing for longer flights than rockets. More-over, specific scientific experiments can be performed by nano-satellites. This paper is intended to present the SERB mission and its scientific objectives. © (2016) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.