The L3DAS21 Challenge 1 is aimed at encouraging and fostering collaborative research on machine learning for 3D audio signal processing, with particular focus on 3D speech enhancement (SE) and 3D sound localization and detection (SELD). Alongside with the challenge, we release the L3DAS21 dataset, a 65 hours 3D audio corpus, accompanied with a Python API that facilitates the data usage and results submission stage. Usually, machine learning approaches to 3D audio tasks are based on single-perspective Ambisonics recordings or on arrays of single-capsule microphones. We propose, instead, a novel multichannel audio configuration based multiple-source and multiple-perspective Ambisonics recordings, performed with an array of two first-order Ambisonics microphones. To the best of our knowledge, it is the first time that a dual-mic Ambisonics configuration is used for these tasks. We provide baseline models and results for both tasks, obtained with state-of-the-art architectures: FaSNet for SE and SELDnet for SELD.
The L3DAS21 Challenge 1 is aimed at encouraging and fostering collaborative research on machine learning for 3D audio signal processing, with particular focus on 3D speech enhancement (SE) and 3D sound localization and detection (SELD). Alongside with the challenge, we release the L3DAS21 dataset, a 65 hours 3D audio corpus, accompanied with a Python API that facilitates the data usage and results submission stage. Usually, machine learning approaches to 3D audio tasks are based on single-perspective Ambisonics recordings or on arrays of single-capsule microphones. We propose, instead, a novel multichannel audio configuration based multiple-source and multiple-perspective Ambisonics recordings, performed with an array of two first-order Ambisonics microphones. To the best of our knowledge, it is the first time that a dual-mic Ambisonics configuration is used for these tasks. We provide baseline models and results for both tasks, obtained with state-of-the-art architectures: FaSNet for SE and SELDnet for SELD.This report is aimed at providing all needed information to participate in the L3DAS21 Challenge, illustrating the details of the L3DAS21 dataset, the challenge tasks and the baseline models.
Human–robot interactions require the ability of the system to determine if the user is paying attention. However, to train such systems, massive amounts of data are required. In this study, we addressed the issue of data scarcity by constructing a large dataset (containing ~120,000 photographs) for the attention detection task. Then, by using this dataset, we established a powerful baseline system. In addition, we extended the proposed system by adding an auxiliary face detection module and introducing a unique GAN-based data augmentation technique. Experimental results revealed that the proposed system yields superior performance compared to baseline models and achieves an accuracy of 88% on the test set. Finally, we created a web application for testing the proposed model in real time.
Human–robot interactions require the ability of the system to determine if the user is paying attention. However, to train such systems, massive amounts of data are required. In this study, we addressed the issue of data scarcity by constructing a large dataset (containing ~120,000 photographs) for the attention detection task. Then, by using this dataset, we established a powerful baseline system. In addition, we extended the proposed system by adding an auxiliary face detection module and introducing a unique GAN-based data augmentation technique. Experimental results revealed that the proposed system yields superior performance compared to baseline models and achieves an accuracy of 88% on the test set. Finally, we created a web application for testing the proposed model in real time.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.