Interspeech 2021 2021
DOI: 10.21437/interspeech.2021-259
|View full text |Cite
|
Sign up to set email alerts
|

Speech Enhancement with Weakly Labelled Data from AudioSet

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
8
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(8 citation statements)
references
References 0 publications
0
8
0
Order By: Relevance
“…3) AudioCaps: Our downloaded test set of the AudioCaps dataset [33] includes 957 audio clips, each annotated with five captions. To generate audio mixtures, we initially select an audio clip from the test set to serve as the target source, followed by a random selection of another audio clip as the background source, ensuring that the sound event tag 7 of the background source does not coincide with that of the target source. For the test mixtures, each test audio is mixed with five randomly chosen background sources with an SNR at 0 dB.…”
Section: Datasets and Evaluation Benchmarkmentioning
confidence: 99%
See 4 more Smart Citations
“…3) AudioCaps: Our downloaded test set of the AudioCaps dataset [33] includes 957 audio clips, each annotated with five captions. To generate audio mixtures, we initially select an audio clip from the test set to serve as the target source, followed by a random selection of another audio clip as the background source, ensuring that the sound event tag 7 of the background source does not coincide with that of the target source. For the test mixtures, each test audio is mixed with five randomly chosen background sources with an SNR at 0 dB.…”
Section: Datasets and Evaluation Benchmarkmentioning
confidence: 99%
“…The test set of the Voicebank-Demand dataset includes a total of 824 utterances, which is used to evaluate the zero-shot performance of our model on speech enhancement. To make a fair comparison with previous speech enhancement systems [6], [7], [70], [71], we resample all audio clips at 16 kHz. We use "Speech" as the input text query to perform speech enhancement.…”
Section: ) Esc-50mentioning
confidence: 99%
See 3 more Smart Citations