Neural Target Speech Extraction: An overview

Žmolíková, Kateřina; Delcroix, Marc; Ochiai, Tsubasa; Kinoshita, Keisuke; Černocký, Jaň; Yu, Dong

doi:10.1109/msp.2023.3240008

Cited by 33 publications

(9 citation statements)

References 66 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Neural networks for target speech extraction. The goal here is to extract the speech signal of a target speaker, from a mixture of several speakers, given additional clues to identify the target speaker [70]. Prior work has explored three kinds of clues: audio clues from pre-recordings of the target speaker [7,18,23,36,67,71,72], visual clues using a video recording [52] and spatial clues by providing the direction and/or location of the target speaker.…”

Section: Background and Related Workmentioning

confidence: 99%

“…Target speech extraction is also related to the more general blind source separation problem [68] where the task is to separate all speakers in a mixture. This is challenging with an unknown number of speakers and with permutations between mapping the model output to the corresponding speakers [70].…”

Section: Background and Related Workmentioning

confidence: 99%

“…Our key observation is that for hearable applications of deep learningbased target speech extraction [21,70,72], it is often impractical to obtain a clean speech sample of the target speaker. In this work, we propose a target speech hearing (TSH) system suitable for binaural hearables applications that provides an interface for noisy in-thewild speech samples, which we refer to as noisy enrollments.…”

Section: Target Speech Hearing With Noisy Examplesmentioning

confidence: 99%

“…The latter, which we call target speech hearing, is a new capability for general-purpose hearable devices. Existing deep learning approaches for the problem of target speech extraction require prior clean audio examples of the target speaker [70]. These clean examples are utilized by a neural network to learn the characteristics of the target speaker, which are subsequently employed to separate their speech from that of other concurrent speakers.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Look Once to Hear: Target Speech Hearing with Noisy Examples

Veluri,

Itani,

Chen

et al. 2024

Proceedings of the CHI Conference on Human Factors in Computing Systems

View full text Add to dashboard Cite

Figure 1: "Look once to hear" is an intelligent hearable system where users choose to hear a target speaker by looking at them for a few seconds. (A) Two users are walking near a noisy street, (B) the wearer looks at the target speaker for a few seconds to capture a noisy binaural audio example, which is used to learn the speech traits of the target speaker, and (C) the hearable extracts the target speaker and removes interference, even when the wearer is no longer looking at the target speaker.

show abstract

Section: Background and Related Workmentioning

confidence: 99%

Section: Background and Related Workmentioning

confidence: 99%

Section: Target Speech Hearing With Noisy Examplesmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Look Once to Hear: Target Speech Hearing with Noisy Examples

Veluri,

Itani,

Chen

et al. 2024

Proceedings of the CHI Conference on Human Factors in Computing Systems

View full text Add to dashboard Cite

show abstract

“…Despite the progress, notable challenges persist in dynamic scenarios where the target speaker's location is not fixed. Additionally, the paper raises awareness of a gap in research, as investigations into these dynamic cases are relatively rare, emphasizing the need for further exploration in this area to enhance the applicability of TSE methodologies[5].In a certain research work the proposed approach follows a significant trend in using speech recognition for efficient speech-to-text conversion, offering potential benefits in transcription and enhancing content understanding, particularly in fields like lecture note archiving. This model seamlessly integrates speech recognition technology, providing a comprehensive solution for transcribing spoken language.Nonetheless, a notable challenge lies in its limited focus, primarily summarizing sentences that conclude with a full stop or contain brief pauses marked by commas, overlooking other punctuation marks.…”

mentioning

confidence: 99%

Exploring Advances in Meeting Minutes Generation and Face Attendance Systems: A Comprehensive Literature Survey

Mane

2024

IJSREM

View full text Add to dashboard Cite

The existing literature survey offers a thorough exploration of automatic text summarization, speech-to-text conversion, and face recognition technologies, all of which are integral to the proposed model named as ConvoLogix. Historically, traditional methods for managing meetings and collaboration involved manual note-taking and attendance tracking. These processes were time-consuming, error-prone, and not conducive to optimization. In today’s fast-paced business environment, the absence of automated solutions hinders efficiency and collaboration. The survey underscores the significance of embracing advanced techniques in Machine Learning with Traditional and Deep Learning Models for Audio and video processing for process automation, emphasizing their pivotal role in streamlining meetings and attendance tracking. A key theme within the survey is the identification of limitations associated with conventional approaches to meeting minute generation and attendance recording. The ConvoLogix model’s core objective is to leverage these technologies to automate meeting minutes generation and attendance tracking, resulting in time savings, improved collaboration, and data-driven insights. Key Words: Automated Meeting Summarization, Face Attendance Tracking, Natural Language Processing, Machine Learning, Deep Learning, Neural Network Models

show abstract

Revolutionizing Speech Emotion Recognition: A Novel Hilbert Curve Approach for Two-Dimensional Representation and Convolutional Neural Network Classification

Tyagi,

Szénási

2024

Mechanisms and Machine Science

View full text Add to dashboard Cite

Neural Target Speech Extraction: An overview

Cited by 33 publications

References 66 publications

Look Once to Hear: Target Speech Hearing with Noisy Examples

Look Once to Hear: Target Speech Hearing with Noisy Examples

Exploring Advances in Meeting Minutes Generation and Face Attendance Systems: A Comprehensive Literature Survey

Revolutionizing Speech Emotion Recognition: A Novel Hilbert Curve Approach for Two-Dimensional Representation and Convolutional Neural Network Classification

Contact Info

Product

Resources

About