2021
DOI: 10.48550/arxiv.2110.15018
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

TorchAudio: Building Blocks for Audio and Speech Processing

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
6
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 7 publications
(6 citation statements)
references
References 0 publications
0
6
0
Order By: Relevance
“…LM Beam-Search Decoding In all our experimental results, we report WER and CER, both with greedy and LM-beam search decoding. We rely on the lexicon-based beam-search decoder (with a word-based LM) from the flashlight framework [21], ported in torchaudio [40]. The same beam-search decoder is used to generate PLs in cross-lingual PL 5 .…”
Section: Monolingual Language Modelsmentioning
confidence: 99%
“…LM Beam-Search Decoding In all our experimental results, we report WER and CER, both with greedy and LM-beam search decoding. We rely on the lexicon-based beam-search decoder (with a word-based LM) from the flashlight framework [21], ported in torchaudio [40]. The same beam-search decoder is used to generate PLs in cross-lingual PL 5 .…”
Section: Monolingual Language Modelsmentioning
confidence: 99%
“…First, feature extraction is performed on the raw audio. For MFCC calculation, we use the implementation by torchaudio [16] with the default parameters and a sample rate of 16 kHz. The XLS-R feature extraction is based on the facebook/wav2vec2-xls-r-300m model available at the Hug-gingFace [17] model hub.…”
Section: Feature Extractionmentioning
confidence: 99%
“…Audio file pre-processing operations have been conducted with the Python libraries NumPy [64] for operations on arrays and LibRosa [65] for audio file loading, resampling, normalizing and writing. Feature extraction procedures have also been performed with LibRosa and Torchaudio [66] libraries.…”
Section: A3siren-recordingsmentioning
confidence: 99%