Abstract:We present a computational auditory scene analysis system for separating and recognizing target speech in the presence of competing speech or noise. We estimate, in two stages, the ideal binary time-frequency (T-F) mask which retains the mixture in a local T-F unit if and only if the target is stronger than the interference within the unit. In the first stage, we use harmonicity to segregate the voiced portions of individual sources in each time frame based on multipitch tracking. Additionally, unvoiced portio… Show more
“…Several computational auditory scene analysis (CASA) techniques were proposed in the literature modeling the above two-stage segregation process (Wang and Brown, 2006). The goal of CASA techniques was to segregate only the target signal, rather than all interfering sources, from the sound mixtures, and the means suggested for achieving this goal was the ideal T-F binary mask (Wang, 2005).…”
The application of the ideal binary mask to an auditory mixture has been shown to yield substantial improvements in intelligibility. This mask is commonly applied to the time-frequency (T-F) representation of a mixture signal and eliminates portions of a signal below a signal-to-noise-ratio (SNR) threshold while allowing others to pass through intact. The factors influencing intelligibility of ideal binary-masked speech are not well understood and are examined in the present study. Specifically, the effects of the local SNR threshold, input SNR level, masker type and errors introduced in estimating the ideal mask are examined. Consistent with previous studies, intelligibility of binary-masked stimuli is quite high even at -10 dB SNR for all maskers tested. Performance was affected the most when the masker dominated T-F units were wrongly labeled as target-dominated T-F units. Performance plateaued near 100% correct for SNR thresholds ranging from -20 dB to 5 dB. We believe the existence of the plateau region suggests that it is the pattern of the ideal binary mask that matters the most rather than the local SNR of each T-F unit. This pattern directs the listener's attention to where the target is and enables them to segregate speech effectively in multi-talker environments.
“…Several computational auditory scene analysis (CASA) techniques were proposed in the literature modeling the above two-stage segregation process (Wang and Brown, 2006). The goal of CASA techniques was to segregate only the target signal, rather than all interfering sources, from the sound mixtures, and the means suggested for achieving this goal was the ideal T-F binary mask (Wang, 2005).…”
The application of the ideal binary mask to an auditory mixture has been shown to yield substantial improvements in intelligibility. This mask is commonly applied to the time-frequency (T-F) representation of a mixture signal and eliminates portions of a signal below a signal-to-noise-ratio (SNR) threshold while allowing others to pass through intact. The factors influencing intelligibility of ideal binary-masked speech are not well understood and are examined in the present study. Specifically, the effects of the local SNR threshold, input SNR level, masker type and errors introduced in estimating the ideal mask are examined. Consistent with previous studies, intelligibility of binary-masked stimuli is quite high even at -10 dB SNR for all maskers tested. Performance was affected the most when the masker dominated T-F units were wrongly labeled as target-dominated T-F units. Performance plateaued near 100% correct for SNR thresholds ranging from -20 dB to 5 dB. We believe the existence of the plateau region suggests that it is the pattern of the ideal binary mask that matters the most rather than the local SNR of each T-F unit. This pattern directs the listener's attention to where the target is and enables them to segregate speech effectively in multi-talker environments.
“…Computational auditory scene analysis (CASA) is one of the popular speech separation methods that exploits human perceptual processing in computational systems (Wang & Brown, 2006). Human beings have shown great success in speech separation using our inborn capability.…”
“…Masks are applied to spectrograms of mixed sounds. If the value of 1 is applied for a t-f unit in which the target energy is stronger than the total interference energy, and the value of 0 otherwise, the mask is called ideal binary mask (Wang, Brown, 2006;Brungart et al, 2009).…”
Section: Introductionmentioning
confidence: 99%
“…They are collectively referred to as Computational Auditory Stream Analysis (CASA, for a review, see Wang and Brown, 2006). …”
Ultrasound is used for breast cancer detection as a technique complementary to mammography, the standard screening method. Current practice is based on reflectivity images obtained with conventional instruments by an operator who positions the ultrasonic transducer by hand over the patient's body. It is a non-ionizing radiation, pain-free and not expensive technique that provides a higher contrast than mammography to discriminate among fluid-filled cysts and solid masses, especially for dense breast tissue. However, results are quite dependent on the operator's skills, images are difficult to reproduce, and state-of-the-art instruments have a limited resolution and contrast to show micro-calcifications and to discriminate between lesions and the surrounding tissue. In spite of their advantages, these factors have precluded the use of ultrasound for screening.This work approaches the ultrasound-based early detection of breast cancer with a different concept. A ring array with many elements to cover 360• around a hanging breast allows obtaining repeatable and operator-independent coronal slice images. Such an arrangement is well suited for multi-modal imaging that includes reflectivity, compounded, tomography, and phase coherence images for increased specificity in breast cancer detection. Preliminary work carried out with a mechanical emulation of the ring array and a standard breast phantom shows a high resolution and contrast, with an artifact-free capability provided by phase coherence processing.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.