Deep neural networks (DNN) have recently been widely used in speaker recognition systems, achieving state-of-the-art performance on various benchmarks. The x-vector architecture is especially popular in this research community, due to its excellent performance and manageable computational complexity. In this paper, we present the lrx-vector system, which is the low-rank factorized version of the x-vector embedding network. The primary objective of this topology is to further reduce the memory requirement of the speaker recognition system. We discuss the deployment of knowledge distillation for training the lrx-vector system and compare against low-rank factorization with SVD. On the VOiCES 2019 far-field corpus we were able to reduce the weights by 28% compared to the full-rank x-vector system while keeping the recognition rate constant (1.83 % EER).
In this paper, a novel technique is proposed that recognizes speech on a server but all private knowledge is processed on the client. Private knowledge could be address book entries, calendar entries or medical patient data.The technique combines the advantage of a powerful server with almost unlimited memory and the advantage using locally available user dependent knowledge. A dynamic language model is used to recognize speech with the help of content dependent acoustic fillers on a server. The result is then recognized including user dependent knowledge on a client, e.g., a smart phone. We achieved a word error rate reduction of 17% on the Wall Street Journal Corpus.
This paper describes an approach for intent classification and tagging on embedded devices, such as smart watches. We describe a technique to train neuronal networks where the final neuronal network weights are binary. This enables memory bandwidth optimized inference and efficient computation even on constrained/embedded platforms. The flow of the approach is as follows: tf-idf word selection method reduces the number of overall weights. Bag-of-Words features are used with a feedforward and recurrent neuronal network for intent classification and tagging, respectively. A novel double Gaussian based regularization term is used to train the network. Finally, the weights are almost clipped lossless to −1 or 1 which results in a tiny binary neuronal network for intent classification and tagging. Our technique is evaluated using a text corpus of transcribed and annotated voice queries. The test domain is "lights control". We compare the intent and tagging accuracy of the ultra-compact binary neuronal network with our baseline system. The novel approach yields comparable accuracy but reduces the model size by a factor of 16: from 160kB to 10kB.
In this study we address the question to what extent syntactic word-order traits of different languages have evolved under correlation and whether such dependencies can be found universally across all languages or restricted to specific language families. To do so, we use logistic Brownian Motion under a Bayesian framework to model the trait evolution for 768 languages from 34 language families. We test for trait correlations both in single families and universally over all families.Separate models reveal no universal correlation patterns and Bayes Factor analysis of models over all covered families also strongly indicate lineage specific correlation patters instead of universal dependencies.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.