ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2020
DOI: 10.1109/icassp40776.2020.9054664
|View full text |Cite
|
Sign up to set email alerts
|

Training Spoken Language Understanding Systems with Non-Parallel Speech and Text

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
7
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
1
1

Relationship

4
3

Authors

Journals

citations
Cited by 11 publications
(7 citation statements)
references
References 15 publications
0
7
0
Order By: Relevance
“…Cross-modal processing has been recently used in different combinations such as audio-video [15,14,16,17] and speech-text [18]. The common approach in these studies is to map inputs from different modalities into a shared space to achieve cross-modal retrieval.…”
Section: Related Workmentioning
confidence: 99%
“…Cross-modal processing has been recently used in different combinations such as audio-video [15,14,16,17] and speech-text [18]. The common approach in these studies is to map inputs from different modalities into a shared space to achieve cross-modal retrieval.…”
Section: Related Workmentioning
confidence: 99%
“…For most of its history, SLU was developed in a pipelined fashion, with ASR feeding text to a natural language understanding system, e.g., to the best of our knowledge, the only published uses of SLU with knowledge graphs that fit this description is (Woods, 1975). Recent research in end-to-end multimodal SLU bypasses the need for ASR by leveraging a parallel modality such as image (Harwath et al, 2016;Kamper et al, 2019) or video (Sanabria et al, 2018), or a non-parallel corpus of text (Sarı et al, 2020), to guide learning speech embeddings such that the speech input can be used in a downstream task.…”
Section: Related Work: Multimodal Slumentioning
confidence: 99%
“…Cross-modal processing has been recently used in different combinations such as audio-video [15,14,16,17] and speech-text [18]. The common approach in these studies is to map inputs from different modalities into a shared space to achieve cross-modal retrieval.…”
Section: Related Workmentioning
confidence: 99%