2020
DOI: 10.48550/arxiv.2010.13956
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Recent Developments on ESPnet Toolkit Boosted by Conformer

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

2
34
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 31 publications
(36 citation statements)
references
References 22 publications
2
34
0
Order By: Relevance
“…As output labels, 256-word pieces based on Jamo (Korean alphabet) were used. The other model specifications and training strategy can be found in [28].…”
Section: A Experimental Setupmentioning
confidence: 99%
“…As output labels, 256-word pieces based on Jamo (Korean alphabet) were used. The other model specifications and training strategy can be found in [28].…”
Section: A Experimental Setupmentioning
confidence: 99%
“…where β is the weight that balances the CTC and the CE loss. In the decoding stage, only the probabilities of the decoder and WPCTC loss are combined to obtain the final output [14,23,24]:…”
Section: Pm-mmut: Multi-modeling Unit Training Fusion With Pm Trainingmentioning
confidence: 99%
“…For Uyghur speech recognition task, following our previous setups [14], the experiments use 40 Mel Frequency Cepstral Coefficients (MFCCs) over 25 ms frames with 10 ms stride to each of which cepstral mean and variance normalization (CMVN) is applied. In English tasks, following [24], we use 80-dimensional logmel spectral energies plus 3 extra features for pitch information as acoustic features input. Following [14,24], the trade off weight β was set to 0.3 over all the tasks.…”
Section: Experiments Setupmentioning
confidence: 99%
See 2 more Smart Citations