Sampling Strategies in Siamese Networks for Unsupervised Speech Representation Learning

Riad, Rachid; Dancette, Corentin; Karadayi, Julien; Zeghidour, Neil; Schatz, Thomas; Dupoux, Emmanuel

doi:10.21437/interspeech.2018-2384

Cited by 22 publications

(23 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The BNFs are in any case competitive with the higher dimensional features, and have the advantage that they can be built using standard Kaldi scripts and do not require any training on the target language, so can easily be deployed to new languages. The competitive result of [43] also shows that in general a system trained on word pairs discovered from a UTD system can perform very well.…”

Section: Evaluation Using Zrsc Data and Measuresmentioning

confidence: 85%

Multilingual and unsupervised subword modeling for zero-resource languages

Hermann

Kamper

Goldwater

2021

Computer Speech & Language

View full text Add to dashboard Cite

Unsupervised subword modeling aims to learn lowlevel representations of speech audio in "zero-resource" settings: that is, without using transcriptions or other resources from the target language (such as text corpora or pronunciation dictionaries). A good representation should capture phonetic content and abstract away from other types of variability, such as speaker differences and channel noise. Previous work in this area has primarily focused on learning from target language data only, and has been evaluated only intrinsically. Here we directly compare multiple methods, including some that use only target language speech data and some that use transcribed speech from other (non-target) languages, and we evaluate using two intrinsic measures as well as on a downstream unsupervised word segmentation and clustering task. We find that combining two existing target-language-only methods yields better features than either method alone. Nevertheless, even better results are obtained by extracting target language bottleneck features using a model trained on other languages. Cross-lingual training using just one other language is enough to provide this benefit, but multilingual training helps even more. In addition to these results, which hold across both intrinsic measures and the extrinsic task, we discuss the qualitative differences between the different types of learned features.

show abstract

Section: Evaluation Using Zrsc Data and Measuresmentioning

confidence: 85%

Multilingual and unsupervised subword modeling for zero-resource languages

Hermann

Kamper

Goldwater

2021

Computer Speech & Language

View full text Add to dashboard Cite

show abstract

“…One type of neural approach that has received particular attention is Siamese networks [18]- [22]. A Siamese network consists of two identical sub-networks with tied weights taking in a pair of inputs [23].…”

Section: Introductionmentioning

confidence: 99%

Unsupervised Feature Learning for Speech Using Correspondence and Siamese Networks

Engelbrecht

Kamper

2020

IEEE Signal Process. Lett.

View full text Add to dashboard Cite

In zero-resource settings where transcribed speech audio is unavailable, unsupervised feature learning is essential for downstream speech processing tasks. Here we compare two recent methods for frame-level acoustic feature learning. For both methods, unsupervised term discovery is used to find pairs of word examples of the same unknown type. Dynamic programming is then used to align the feature frames between each word pair, serving as weak top-down supervision for the two models. For the correspondence autoencoder (CAE), matching frames are presented as input-output pairs. The Triamese network uses a contrastive loss to reduce the distance between frames of the same predicted word type while increasing the distance between negative examples. For the first time, these feature extractors are compared on the same discrimination tasks using the same weak supervision pairs. We find that, on the two datasets considered here, the CAE outperforms the Triamese network. However, we show that a new hybrid correspondence-Triamese approach (CTriamese), consistently outperforms both the CAE and Triamese models in terms of average precision and ABX error rates on both English and Xitsonga evaluation data.

show abstract

“…The probability of selecting a node is a function of its degree, utilizing the function proposed by Riad et al. 29 The sampling compression function is chosen to be the square-root function, which retains the power-law degree distribution while keeping the frequency ranking of each node. When a batch of nodes S is sampled without replacement from this distribution, each node i has a set of positive edges, P i .…”

Section: Model Optimization With Batch Sampling Strategymentioning

confidence: 99%

“…For a test to differentiate between positive interactions and random noise interactions, we also uniformly sample a number of interactions from the set of all possible pairwise interactions to consider as negative interaction using the random node sampling distribution specified in Riad et al . 29 This set is denoted as E n , and the number of negative interactions is sampled such that the ratio of negative to positive interactions is 1.0. At evaluation time, the set of ground truth validation edges E d and random noise E n edges is used to calculate the precision and recall rates.…”

Section: Novel Link Predictionsmentioning

confidence: 99%

Network Representation of Large-Scale Heterogeneous RNA Sequences with Integration of Diverse Multi-omics, Interactions, and Annotations Data

Tran

Gao

2019

Biocomputing 2020

View full text Add to dashboard Cite

Long non-coding RNA (lncRNA), microRNA (miRNA), and messenger RNA (mRNA) enable key regulations of various biological processes through a variety of diverse interaction mechanisms. Identifying the interactions and cross-talk between these heterogeneous RNA classes is essential in order to uncover the functional role of individual RNA transcripts, especially for unannotated and sparsely discovered RNA sequences with no known interactions. Recently, sequence-based deep learning and network embedding methods are gaining traction as high-performing and flexible approaches that can either predict RNA-RNA interactions from a sequence or infer likely/missing interactions from patterns that may exist in the network topology. However, the majority of these current methods have several limitations, e.g., the inability to perform inductive predictions, to distinguish the directionality of interactions, or to integrate various sequence, interaction, expression, and genomic annotation datasets. We proposed a novel deep learning-based framework, rna2rna, which learns from RNA sequences to produce a low-dimensional embedding that preserves the proximities in both the interactions topology and the functional affinity topology. In this proposed embedding space, we have designated a two-part "source and target contexts" to capture the receptive fields of each RNA transcript, while encapsulating the heterogenous crosstalk interactions between lncRNAs and miRNAs. The proximity between RNAs in this embedding space also uncovers the second-order relationships that allow to accurately infer a novel directed interaction or functional similarity between any two RNA sequences. From experimental results, our method exhibits superior performance in measured AUPR rates compared to state-of-art approaches at predicting missing interactions in different RNA-RNA interaction databases. Additional results suggest that our proposed framework can capture a manifold for heterogeneous RNA sequences to discover novel functional annotations. functional interaction mechanisms, lncRNAs are known to act as miRNA decoys, derepress gene expression by competing with miRNAs for shared mRNA targets, or directly regulate gene expression. 1 Determining the biological functions of the individual lncRNAs remains a challenge as most of these RNA transcripts are currently unannotated, and their known interactions are sparse. Recent advances in RNA sequencing (RNA-Seq), deep sequencing (CLIP-seq, LIGR-Seq), and computational methods allow for an unprecedented analysis of such transcripts and have enabled researchers to generate large-scale interaction and functional annotation databases. However, the interaction networks generated from such data are often scant and incomplete in the number of lncRNAs covered. Furthermore, although a large number of lncRNAs have been identified, only a few hundreds have had functional and molecular mechanisms determined to date, as observed in as lncRNAdb. 2 In other avenues, a growing number of lncRNAs are being assigned biological f...

show abstract

Sampling Strategies in Siamese Networks for Unsupervised Speech Representation Learning

Cited by 22 publications

References 18 publications

Multilingual and unsupervised subword modeling for zero-resource languages

Multilingual and unsupervised subword modeling for zero-resource languages

Unsupervised Feature Learning for Speech Using Correspondence and Siamese Networks

Network Representation of Large-Scale Heterogeneous RNA Sequences with Integration of Diverse Multi-omics, Interactions, and Annotations Data

Contact Info

Product

Resources

About