2021
DOI: 10.7717/peerj-cs.650
|View full text |Cite
|
Sign up to set email alerts
|

Spatial position constraint for unsupervised learning of speech representations

Abstract: The success of supervised learning techniques for automatic speech processing does not always extend to problems with limited annotated speech. Unsupervised representation learning aims at utilizing unlabelled data to learn a transformation that makes speech easily distinguishable for classification tasks, whereby deep auto-encoder variants have been most successful in finding such representations. This paper proposes a novel mechanism to incorporate geometric position of speech samples within the global struc… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
1
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
2
1

Relationship

2
1

Authors

Journals

citations
Cited by 3 publications
(2 citation statements)
references
References 29 publications
0
1
0
Order By: Relevance
“…Another study also analyzed the sliding window's effectiveness for extracting spectral or cepstral features [7], whilst Paliwal et al measured the impact of time window duration on speech recognition [8]. Research shows that unsupervised compression of cepstral speech features further enhances the classification accuracy for certain classification tasks [9]. Acoustic features of speech vary significantly for various speaker accents.…”
Section: Literature Reviewmentioning
confidence: 99%
“…Another study also analyzed the sliding window's effectiveness for extracting spectral or cepstral features [7], whilst Paliwal et al measured the impact of time window duration on speech recognition [8]. Research shows that unsupervised compression of cepstral speech features further enhances the classification accuracy for certain classification tasks [9]. Acoustic features of speech vary significantly for various speaker accents.…”
Section: Literature Reviewmentioning
confidence: 99%
“…The pre-training helps the model learn the text structure utilising massive unlabelled text data scraped from the web. The unsupervised pre-training has been proven very effective for a wide range of classification tasks [ 6 ].…”
Section: Introductionmentioning
confidence: 99%