Abstract-Transfer learning is a vital technique that generalizes models trained for one setting or task to other settings or tasks. For example in speech recognition, an acoustic model trained for one language can be used to recognize speech in another language, with little or no re-training data. Transfer learning is closely related to multi-task learning (cross-lingual vs. multilingual), and is traditionally studied in the name of 'model adaptation'. Recent advance in deep learning shows that transfer learning becomes much easier and more effective with high-level abstract features learned by deep models, and the 'transfer' can be conducted not only between data distributions and data types, but also between model structures (e.g., shallow nets and deep nets) or even model types (e.g., Bayesian models and neural models). This review paper summarizes some recent prominent research towards this direction, particularly for speech and language processing. We also report some results from our group and highlight the potential of this very interesting research field 1 .
Deep neural networks (DNNs) have gained remarkable success in speech recognition, partially attributed to the flexibility of DNN models in learning complex patterns of speech signals. This flexibility, however, may lead to serious over-fitting and hence miserable performance degradation in adverse acoustic conditions such as those with high ambient noises. We propose a noisy training approach to tackle this problem: by injecting moderate noises into the training data intentionally and randomly, more generalizable DNN models can be learned. This 'noise injection' technique, although known to the neural computation community already, has not been studied with DNNs which involve a highly complex objective function. The experiments presented in this paper confirm that the noisy training approach works well for the DNN model and can provide substantial performance improvement for DNN-based speech recognition.
Genetically programmed circuits allowing bifunctional dynamic regulation of enzyme expression have far-reaching significances for various bio-manufactural purposes. However, building a bio-switch with a post log-phase response and reversibility during scale-up bioprocesses is still a challenge in metabolic engineering due to the lack of robustness. Here, we report a robust thermosensitive bio-switch that enables stringent bidirectional control of gene expression over time and levels in living cells. Based on the bio-switch, we obtain tree ring-like colonies with spatially distributed patterns and transformer cells shifting among spherical-, rod- and fiber-shapes of the engineered Escherichia coli. Moreover, fed-batch fermentations of recombinant E. coli are conducted to obtain ordered assembly of tailor-made biopolymers polyhydroxyalkanoates including diblock- and random-copolymer, composed of 3-hydroxybutyrate and 4-hydroxybutyrate with controllable monomer molar fraction. This study demonstrates the possibility of well-organized, chemosynthesis-like block polymerization on a molecular scale by reprogrammed microbes, exemplifying the versatility of thermo-response control for various practical uses.
For practical automatic speaker verification (ASV) systems, replay attack poses a true risk. By replaying a pre-recorded speech signal of the genuine speaker, ASV systems tend to be easily fooled. An effective replay detection method is therefore highly desirable. In this study, we investigate a major difficulty in replay detection: the over-fitting problem caused by variability factors in speech signal. An F-ratio probing tool is proposed and three variability factors are investigated using this tool: speaker identity, speech content and playback & recording device. The analysis shows that device is the most influential factor that contributes the highest over-fitting risk. A frequency warping approach is studied to alleviate the over-fitting problem, as verified on the ASV-spoof 2017 database.
The essence of distantly supervised relation extraction is that it is an incomplete multi-label classification problem with sparse and noisy features. To tackle the sparsity and noise challenges, we propose solving the classification problem using matrix completion on factorized matrix of minimized rank. We formulate relation classification as completing the unknown labels of testing items (entity pairs) in a sparse matrix that concatenates training and testing textual features with training labels. Our algorithmic framework is based on the assumption that the rank of item-byfeature and item-by-label joint matrix is low. We apply two optimization models to recover the underlying low-rank matrix leveraging the sparsity of feature-label matrix. The matrix completion problem is then solved by the fixed point continuation (FPC) algorithm, which can find the global optimum. Experiments on two widely used datasets with different dimensions of textual features demonstrate that our low-rank matrix completion approach significantly outperforms the baseline and the state-of-the-art methods.
In authentication scenarios, applications of practical speaker verification systems usually require a person to read a dynamic authentication text. Previous studies played an audio adversarial example as a digital signal to perform physical attacks, which would be easily rejected by audio replay detection modules. This work shows that by playing our crafted adversarial perturbation as a separate source when the adversary is speaking, the practical speaker verification system will misjudge the adversary as a target speaker. A two-step algorithm is proposed to optimize the universal adversarial perturbation to be text-independent and has little effect on the authentication text recognition. We also estimated room impulse response (RIR) in the algorithm which allowed the perturbation to be effective after being played over the air. In the physical experiment, we achieved targeted attacks with success rate of 100%, while the word error rate (WER) on speech recognition was only increased by 3.55%. And recorded audios could pass replay detection for the live person speaking.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.