odeling is a common but important technique for signal characterization. With the advent of computational power, many problems that were considered to be unsolvable in the past can now be tackled with ease. Successful applications in this area include the time-delay estimation modeled as a finite impulse response (FIR) filter [ 11 for sonar and radar systems; speech coding using linear predictive coding [2-41; wavelets for speech and image coding and recognition [5-81; fractals for image compression and recognition systems [9-111; and delayed-X filter [12-141 for active noise control [ 151, to name but a few.An efficient model for signal processing is not easy to come by and is often obtained with the aid of an optimization scheme. The accuracy of the model is generally governed by a set of variables or parameters that is optimized in the
22
IEEE SIGNAL PROCESSING MAGAZINE
This paper proposes a channel pattern noise based approach to guard speaker recognition system against playback attacks. For each recording under investiga tion, the channel pattern noise severs as a unique chan nel identification fingerprint. Denoising filter and statis tical frames are applied to extract channel pattern noise, and 6 Legendre coefficients and 6 statistical features are extracted. SVM is used to train channel noise model to judge whether the input speech is an authentic or a play back recording. The experimental results indicate that, with the designed playback detector, the equal error rate of speaker recognition system is reduced by 30%.
We present a direct speech-to-speech translation (S2ST) model that translates speech from one language to speech in another language without relying on intermediate text generation. We tackle the problem by first applying a self-supervised discrete speech encoder on the target speech and then training a sequenceto-sequence speech-to-unit translation (S2UT) model to predict the discrete representations of the target speech. When target text transcripts are available, we design a joint speech and text training framework that enables the model to generate dual modality output (speech and text) simultaneously in the same inference pass. Experiments on the Fisher Spanish-English dataset show that the proposed framework yields improvement of 6.7 BLEU compared with a baseline direct S2ST model that predicts spectrogram features. When trained without any text transcripts, our model performance is comparable to models that predict spectrograms and are trained with text supervision, showing the potential of our system for translation between unwritten languages 1 .
Recent pretraining models in Chinese neglect two important aspects specific to the Chinese language: glyph and pinyin, which carry significant syntax and semantic information for language understanding. In this work, we propose ChineseBERT, which incorporates both the glyph and pinyin information of Chinese characters into language model pretraining. The glyph embedding is obtained based on different fonts of a Chinese character, being able to capture character semantics from the visual features, and the pinyin embedding characterizes the pronunciation of Chinese characters, which handles the highly prevalent heteronym phenomenon in Chinese (the same character has different pronunciations with different meanings). Pretrained on large-scale unlabeled Chinese corpus, the proposed Chine-seBERT model yields significant performance boost over baseline models with fewer training steps. The proposed model achieves new SOTA performances on a wide range of Chinese NLP tasks,including machine reading comprehension, natural language inference, text classification, sentence pair matching, and competitive performances in named entity recognition and word segmentation. 1
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.