Contrastive Predictive Coding Supported Factorized Variational Autoencoder For Unsupervised Learning Of Disentangled Speech Representations

Ebbers, Janek; Kuhlmann, Michael; Cord-Landwehr, Tobias; Haeb‐Umbach, Reinhold

doi:10.1109/icassp39728.2021.9414487

Cited by 13 publications

(8 citation statements)

References 10 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…So, if the reconstruction is good enough, the procedure can detach the network from local minimums and may be done numerous times. The suggested method's most significant novelty is the simple confirmation of the results of assigning classes to an unknown collection of values using quantifiable criteria [ 64 , 76 ].…”

Section: Proposed S3idsmentioning

confidence: 99%

A Semi-Self-Supervised Intrusion Detection System for Multilevel Industrial Cyber Protection

Zhao

2022

Computational Intelligence and Neuroscience

View full text Add to dashboard Cite

Industry 4.0 affects all components of the modern industry value chain. The accelerating use of the Internet and the convergence of industrial and operational networks constantly increase the need for secure industrial communication solutions. Therefore, “multilevel industrial cyber protection” is critical to Industry 4.0. In general, industrial protection refers to safeguarding information and data and the intellectual property rights of production processes related to the overall industry environment. The availability, integrity, and confidentiality of systems must be maintained. The goal challenge is the best possible protection from attacks and threats which create immediate financial damage and other risks in the industry (reputation, etc.). Based on the Defense-in-Depth strategy, a holistic, multilayered, and in-depth protection of industrial systems is developed in this paper. Specifically, a Semi-Self-Supervised Intrusion Detection System (S3IDS) is proposed, which combines advanced machine learning techniques for industrial data noise reduction to automate the discovery and separation of classes, which are essentially equivalent to cyber-related anomalies. As demonstrated by a mathematical simulation based on computational number theory and specifically on the concept of the single object, the proposed S3IDS learns to accurately reconstruct samples to predict the nature of an anomaly created directly by the industrial ecosystem.

show abstract

Section: Proposed S3idsmentioning

confidence: 99%

A Semi-Self-Supervised Intrusion Detection System for Multilevel Industrial Cyber Protection

Zhao

2022

Computational Intelligence and Neuroscience

View full text Add to dashboard Cite

show abstract

“…In wav2vec [3], the CPC [1] loss is used to pre-train speech representations for the purpose of speech recognition, and experiment results show self-supervised pre-training improves supervised speech recognition. Also, the CPC loss can be used to regularize adversarial training [2]. The CPC loss has also been extended and applied to bidirectional context networks [6].…”

Section: Related Workmentioning

confidence: 99%

TERA: Self-Supervised Learning of Transformer Encoder Representation for Speech

Liu,

Li,

Lee

2020

Preprint

View full text Add to dashboard Cite

We introduce a self-supervised speech pre-training method called TERA, which stands for Transformer Encoder Representations from Alteration. Recent approaches often learn through the formulation of a single auxiliary task like contrastive prediction, autoregressive prediction, or masked reconstruction. Unlike previous approaches, we use a multi-target auxiliary task to pre-train Transformer Encoders on a large amount of unlabeled speech. The model learns through the reconstruction of acoustic frames from its altered counterpart, where we use a stochastic policy to alter along three dimensions: temporal, channel, and magnitude. TERA can be used to extract speech representations or fine-tune with downstream models. We evaluate TERA on several downstream tasks, including phoneme classification, speaker recognition, and speech recognition. TERA achieved strong performance on these tasks by improving upon surface features and outperforming previous methods. In our experiments, we show that through alteration along different dimensions, the model learns to encode distinct aspects of speech. We explore different knowledge transfer methods to incorporate the pre-trained model with downstream models. Furthermore, we show that the proposed method can be easily transferred to another dataset not used in pre-training.

show abstract

“…One successful application of APC is as a pre-training means to learn to extract transferable speech representations [11,13]. [14] proposes another work by using contrastive predictive coding to support factorized disentangled representation learning for speech signal.…”

Section: Introductionmentioning

confidence: 99%

Disentangled Speech Representation Learning Based on Factorized Hierarchical Variational Autoencoder with Self-Supervised Objective

Xie,

Arildsen,

Tan

2022

Preprint

View full text Add to dashboard Cite

Disentangled representation learning aims to extract explanatory features or factors and retain salient information. Factorized hierarchical variational autoencoder (FHVAE) presents a way to disentangle a speech signal into sequential-level and segmental-level features, which represent speaker identity and speech content information, respectively. As a selfsupervised objective, autoregressive predictive coding (APC), on the other hand, has been used in extracting meaningful and transferable speech features for multiple downstream tasks. Inspired by the success of these two representation learning methods, this paper proposes to integrate the APC objective into the FHVAE framework aiming at benefiting from the additional self-supervision target. The main proposed method requires neither more training data nor more computational cost at test time, but obtains improved meaningful representations while maintaining disentanglement. The experiments were conducted on the TIMIT dataset. Results demonstrate that FHVAE equipped with the additional self-supervised objective is able to learn features providing superior performance for tasks including speech recognition and speaker recognition. Furthermore, voice conversion, as one application of disentangled representation learning, has been applied and evaluated. The results show performance similar to baseline of the new framework on voice conversion.

show abstract

Contrastive Predictive Coding Supported Factorized Variational Autoencoder For Unsupervised Learning Of Disentangled Speech Representations

Cited by 13 publications

References 10 publications

A Semi-Self-Supervised Intrusion Detection System for Multilevel Industrial Cyber Protection

A Semi-Self-Supervised Intrusion Detection System for Multilevel Industrial Cyber Protection

TERA: Self-Supervised Learning of Transformer Encoder Representation for Speech

Disentangled Speech Representation Learning Based on Factorized Hierarchical Variational Autoencoder with Self-Supervised Objective

Contact Info

Product

Resources

About