ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
DOI: 10.1109/icassp39728.2021.9414979
|View full text |Cite
|
Sign up to set email alerts
|

Multimodal Punctuation Prediction with Contextual Dropout

Abstract: Automatic speech recognition (ASR) is widely used in consumer electronics. ASR greatly improves the utility and accessibility of technology, but usually the output is only word sequences without punctuation. This can result in ambiguity in inferring user-intent. We first present a transformerbased approach for punctuation prediction that achieves 8% improvement on the IWSLT 2012 TED Task, beating the previous state of the art [1]. We next describe our multimodal model that learns from both text and audio, whic… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(3 citation statements)
references
References 18 publications
0
3
0
Order By: Relevance
“…For the previously proposed punctuation model, a proper adaptation of the proposed coordinate bootstrapper and acoustic assistant can empower them to address modality missing samples. Section 5 discuss the the pervasiveness of coordinate bootstrapper upon previous models [5,12] via experiment.…”
Section: Coordinate Bootstrappermentioning
confidence: 99%
See 2 more Smart Citations
“…For the previously proposed punctuation model, a proper adaptation of the proposed coordinate bootstrapper and acoustic assistant can empower them to address modality missing samples. Section 5 discuss the the pervasiveness of coordinate bootstrapper upon previous models [5,12] via experiment.…”
Section: Coordinate Bootstrappermentioning
confidence: 99%
“…This subsection explores the pervasiveness of UniPunc framework for punctuation restoration on the mixed datasets by grafting onto two previous approaches, namely BiLSTM [5] and content dropout [12]. Specifically, for the BiLSTM model, we introduce an acoustic assistant to extract potential acoustic features and coordinate bootstrapper to learn hybrid representation.…”
Section: Pervasiveness Of Unipuncmentioning
confidence: 99%
See 1 more Smart Citation