2021
DOI: 10.48550/arxiv.2112.10200
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Multi-turn RNN-T for streaming recognition of multi-party speech

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
3
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(3 citation statements)
references
References 0 publications
0
3
0
Order By: Relevance
“…Firstly, we observed that the t-SOT TT-18 with only 40 msec algorithmic latency already outperformed the results of all prior streaming multi-talker ASR models. Note that even though t-SOT TT-18 has almost the same number of parameters with SURT [26,32] or MS-RNN-T [27,34], t-SOT is more space and computationally efficient in the inference because SURT and MS-RNN-T run decoding twice, once for each of the two output branches. Secondly, we observed a significant WER reduction by increasing algorithmic latency and the model size.…”
Section: Resultsmentioning
confidence: 99%
See 2 more Smart Citations
“…Firstly, we observed that the t-SOT TT-18 with only 40 msec algorithmic latency already outperformed the results of all prior streaming multi-talker ASR models. Note that even though t-SOT TT-18 has almost the same number of parameters with SURT [26,32] or MS-RNN-T [27,34], t-SOT is more space and computationally efficient in the inference because SURT and MS-RNN-T run decoding twice, once for each of the two output branches. Secondly, we observed a significant WER reduction by increasing algorithmic latency and the model size.…”
Section: Resultsmentioning
confidence: 99%
“…1 To increase the variability of the training data, we applied the speed perturbation [37] with the ratios of {0.9, 1.0, 1.1}, the volume perturbation with the ratio between 0.125 to 2.0, and the adaptive SpecAugment [38]. Following [21,34], we simulated the training data on the fly to generate infinite variations of the training samples.…”
Section: Experimental Settingsmentioning
confidence: 99%
See 1 more Smart Citation