2019
DOI: 10.48550/arxiv.1904.05862
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

wav2vec: Unsupervised Pre-training for Speech Recognition

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

1
208
0
2

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
5

Relationship

0
10

Authors

Journals

citations
Cited by 151 publications
(230 citation statements)
references
References 0 publications
1
208
0
2
Order By: Relevance
“…However, performance of the acoustic model can further improve by deploying more robust input features other than MFCC. In the final section, we evaluate the proposed method trained on noise-invariant Wav2Vec features [34]. Wav2Vec representation has been trained on large amounts of unlabeled audio data in an unsupervised man-ner.…”
Section: Experiments and Resultsmentioning
confidence: 99%
“…However, performance of the acoustic model can further improve by deploying more robust input features other than MFCC. In the final section, we evaluate the proposed method trained on noise-invariant Wav2Vec features [34]. Wav2Vec representation has been trained on large amounts of unlabeled audio data in an unsupervised man-ner.…”
Section: Experiments and Resultsmentioning
confidence: 99%
“…Correspondence to: Pilsung Kang <pilsung kang@korea.ac.kr>. recognition (Chen et al, 2020), and auto speech recognition (Schneider et al, 2019;Baevski et al, 2019;. The Wav2vec 2.0 model (Baevski et al, 2020) is an end-to-end framework of self-supervised learning for automatic speech recognition (ASR), and it has recently been presented as an effective pre-training method to learn speech representations.…”
Section: Introductionmentioning
confidence: 99%
“…Considering the complex ATC environment, the handcrafted feature engineering may not be an optimal option for ASR tasks. Therefore, the learning mechanism was proposed to learn informative and discriminative features from raw waveforms, which achieved desired performance improvement for common ASR applications, such as Sinc-Net, wav2vec [7], [8].…”
Section: Introductionmentioning
confidence: 99%