Applied Acoustics

2021

DOI: 10.1016/j.apacoust.2020.107566

|View full text |Cite

|

Sign up to set email alerts

|

Clustering of spatial cues by semantic segmentation for anechoic binaural source separation

¹

,

Muhammad Sheryar Fulaly

²

,

Muhammad Salman Khan

³

et al.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Related Work5

Citation Types

Supporting

0

Mentioning

27

Contrasting

0

Year Published

2021

2021

2023

2023

Publication Types

Select...

Article5

Relationship

Self Cite3

Independent2

Authors

Journals

Cited by 7 publications

(27 citation statements)

References 19 publications

Supporting

0

Mentioning

27

Contrasting

0

Order By: Relevance

“…Removing the phase wrap problem by using the top down approach of [2] did not work well in case of U-Net, as it reduces the IPD variance of each source. This variance reduction works well with the expectation maximization (EM) algorithm [2] but not for the convolutional neural network U-Net [11]. The inclusion of IPD cues (whether the values observed from the mixture or those after the phase unwrap by the top down approach) in SONET [11], resulted in decline of its output performance.…”

Section: Related Workmentioning

confidence: 94%

“…This variance reduction works well with the expectation maximization (EM) algorithm [2] but not for the convolutional neural network U-Net [11]. The inclusion of IPD cues (whether the values observed from the mixture or those after the phase unwrap by the top down approach) in SONET [11], resulted in decline of its output performance. The performance comparison of the two speech separation models, one using the EM algorithm and the other using the SONET-P network for clustering the IPD cues, is given in the 'experiment' section (Section V).…”

Section: Related Workmentioning

confidence: 94%

“…SONET [11] is a U-Net (deep learning convolutional neural network) based interaural speech separation model, designed for anechoic conditions. Like MESSL, this model is also based on WDO assumption and uses the spatial cues for source separation.…”

Section: Related Workmentioning

confidence: 99%

“…In SONET, two separate U-Nets (a specialized neural network for semantic segmentation) are trained on the interaural level difference (ILD) and the interaural phase difference (IPD)) spectrograms generated by a single source. These U-Nets are named as SONET-L (SONET trained on ILD cues) and SONET-P (SONET trained on IPD cues) in [11]. After training, these U-Nets are used to predict the class of each time frequency (TF) unit of the interaural spectrogram of an audio mixture.…”

Section: Related Workmentioning

confidence: 99%

“…Also, as the training phase of deep neural network is lengthy and requires high-end computational resources (graphical processing unit (GPU)), a 'generalized solution' is suggested in [11] as shown in Figure 4 (b), where the network trained for one source position can be used for other sources placed in its neighboring positions due to similarity of interaural cues generated by these sources.…”

Section: Related Workmentioning

confidence: 99%

See 4 more Smart Citations

Integration of deep learning with expectation maximization for spatial cue-based speech separation in reverberant conditions

¹

,

²

,

³

2021

Applied Acoustics

Self Cite

View full text Add to dashboard Cite

No abstract

“…Removing the phase wrap problem by using the top down approach of [2] did not work well in case of U-Net, as it reduces the IPD variance of each source. This variance reduction works well with the expectation maximization (EM) algorithm [2] but not for the convolutional neural network U-Net [11]. The inclusion of IPD cues (whether the values observed from the mixture or those after the phase unwrap by the top down approach) in SONET [11], resulted in decline of its output performance.…”

Section: Related Workmentioning

confidence: 94%

“…This variance reduction works well with the expectation maximization (EM) algorithm [2] but not for the convolutional neural network U-Net [11]. The inclusion of IPD cues (whether the values observed from the mixture or those after the phase unwrap by the top down approach) in SONET [11], resulted in decline of its output performance. The performance comparison of the two speech separation models, one using the EM algorithm and the other using the SONET-P network for clustering the IPD cues, is given in the 'experiment' section (Section V).…”

Section: Related Workmentioning

confidence: 94%

“…SONET [11] is a U-Net (deep learning convolutional neural network) based interaural speech separation model, designed for anechoic conditions. Like MESSL, this model is also based on WDO assumption and uses the spatial cues for source separation.…”

Section: Related Workmentioning

confidence: 99%

“…In SONET, two separate U-Nets (a specialized neural network for semantic segmentation) are trained on the interaural level difference (ILD) and the interaural phase difference (IPD)) spectrograms generated by a single source. These U-Nets are named as SONET-L (SONET trained on ILD cues) and SONET-P (SONET trained on IPD cues) in [11]. After training, these U-Nets are used to predict the class of each time frequency (TF) unit of the interaural spectrogram of an audio mixture.…”

Section: Related Workmentioning

confidence: 99%

“…Also, as the training phase of deep neural network is lengthy and requires high-end computational resources (graphical processing unit (GPU)), a 'generalized solution' is suggested in [11] as shown in Figure 4 (b), where the network trained for one source position can be used for other sources placed in its neighboring positions due to similarity of interaural cues generated by these sources.…”

Section: Related Workmentioning

confidence: 99%

See 3 more Smart Citations

Integration of deep learning with expectation maximization for spatial cue-based speech separation in reverberant conditions

¹

,

²

,

³

2021

Applied Acoustics

Self Cite

View full text Add to dashboard Cite

No abstract

Enabling an anechoic U-Net based speech separation model for online and offline applications in reverberant conditions

¹

,

²

,

³

et al. 2021

Applied Acoustics

View full text Add to dashboard Cite

No abstract

Enhancing the correlation between the quality and intelligibility objective metrics with the subjective scores by shallow feed forward neural network for time–frequency masking speech separation algorithms

¹

,

²

,

³

et al. 2022

Applied Acoustics

View full text Add to dashboard Cite

No abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Product

Browser Extension Assistant by scite Citation Statement Search Reference Check Visualizations Dashboards Explore Journals Explore Organizations Explore Funders Embedding Badge Embedding Citation Search Pricing

Resources

Blog Help & FAQ Accessibility Statement API Terms For Universities & Governments For Researchers For Publishers For Corporate, Pharma & Enterprise Author Marketing Become an Affiliate Get an organization trial or quote scite Data & Services

About

News & Press Careers Read our Paper Coverage

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Copyright © 2024 scite LLC. All rights reserved.

Made with 💙 for researchers

Part of the Research Solutions Family.