2023
DOI: 10.1109/msp.2023.3240008
|View full text |Cite
|
Sign up to set email alerts
|

Neural Target Speech Extraction: An overview

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
9
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 33 publications
(9 citation statements)
references
References 66 publications
0
9
0
Order By: Relevance
“…Neural networks for target speech extraction. The goal here is to extract the speech signal of a target speaker, from a mixture of several speakers, given additional clues to identify the target speaker [70]. Prior work has explored three kinds of clues: audio clues from pre-recordings of the target speaker [7,18,23,36,67,71,72], visual clues using a video recording [52] and spatial clues by providing the direction and/or location of the target speaker.…”
Section: Background and Related Workmentioning
confidence: 99%
See 3 more Smart Citations
“…Neural networks for target speech extraction. The goal here is to extract the speech signal of a target speaker, from a mixture of several speakers, given additional clues to identify the target speaker [70]. Prior work has explored three kinds of clues: audio clues from pre-recordings of the target speaker [7,18,23,36,67,71,72], visual clues using a video recording [52] and spatial clues by providing the direction and/or location of the target speaker.…”
Section: Background and Related Workmentioning
confidence: 99%
“…Target speech extraction is also related to the more general blind source separation problem [68] where the task is to separate all speakers in a mixture. This is challenging with an unknown number of speakers and with permutations between mapping the model output to the corresponding speakers [70].…”
Section: Background and Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…Despite the progress, notable challenges persist in dynamic scenarios where the target speaker's location is not fixed. Additionally, the paper raises awareness of a gap in research, as investigations into these dynamic cases are relatively rare, emphasizing the need for further exploration in this area to enhance the applicability of TSE methodologies[5].In a certain research work the proposed approach follows a significant trend in using speech recognition for efficient speech-to-text conversion, offering potential benefits in transcription and enhancing content understanding, particularly in fields like lecture note archiving. This model seamlessly integrates speech recognition technology, providing a comprehensive solution for transcribing spoken language.Nonetheless, a notable challenge lies in its limited focus, primarily summarizing sentences that conclude with a full stop or contain brief pauses marked by commas, overlooking other punctuation marks.…”
mentioning
confidence: 99%