L3DAS21 Challenge: Machine Learning for 3D Audio Signal Processing

Guizzo, Eric; Gramaccioni, Riccardo F.; Jamili, Saeid; Marinoni, C.; Massaro, Edoardo; Medaglia, Claudia; Nachira, Giuseppe; Nucciarelli, Leonardo; Paglialunga, Ludovica; Pennese, Marco; Pepe, Sveva; Rocchi, Emilio; Uncini, Aurelio; Comminiello, Danilo

doi:10.48550/arxiv.2104.05499

Cited by 2 publications

(4 citation statements)

References 21 publications

(30 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This dataset has been used in a number of works to validate the effectiveness of a proposed method on "real-life" recordings, e.g., [63], [69], [72], [111], [129], [211]. Very recently, a SELD challenge focused on 3D sound has been announced [244], where a pair of FOA microphones was used to capture a large number of RIRs in an office room, from which the audio data has been generated.…”

Section: B Real Datamentioning

confidence: 99%

A Survey of Sound Source Localization with Deep Learning Methods

Grumiaux,

Kitić,

Girin

et al. 2021

Preprint

View full text Add to dashboard Cite

This article is a survey on deep learning methods for single and multiple sound source localization. We are particularly interested in sound source localization in indoor/domestic environment, where reverberation and diffuse noise are present. We provide an exhaustive topography of the neural-based localization literature in this context, organized according to several aspects: the neural network architecture, the type of input features, the output strategy (classification or regression), the types of data used for model training and evaluation, and the model training strategy. This way, an interested reader can easily comprehend the vast panorama of the deep learning-based sound source localization methods. Tables summarizing the literature survey are provided at the end of the paper for a quick search of methods with a given set of target characteristics.

show abstract

Section: B Real Datamentioning

confidence: 99%

A Survey of Sound Source Localization with Deep Learning Methods

Grumiaux,

Kitić,

Girin

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…More recently, approaches around 3D SE are gaining interest with the signal processing community (GUIZZO; GRAMACCIONI, et al, 2021;MARINONI, et al, 2022). This scenario is more complex because both noises and reverberation must be handled.…”

Section: Review On Speech Enhancement Methodsmentioning

confidence: 99%

“…The Filter and Sum Network (FaSNet) (LUO et al, 2019) is the proposed baseline (GUIZZO; GRAMACCIONI, et al, 2021) for the challenge. The FaSNet is a time-domain neural beamforming with high-performance for low-latency scenarios.…”

Section: Ieee Mlsp 2021 Data Challengementioning

confidence: 99%

“…vided by the Learning 3D Audio Sources (L3DAS) project from the Sapienza University of Rome. The data was made publicly available through a data competition (GUIZZO;GRAMACCIONI, et al, 2021) in the IEEE International Workshop on Machine Learning for Signal Processing (MLSP). The L3DAS21 dataset contained multiple-source and multiple-perspective B-format Ambisonics audio recordings, with 16 bit-AmbiX wav files having a sampling rate of 16 kHz and was designed based on clean speech sounds extracted from the Librispeech.…”

mentioning

confidence: 99%

See 1 more Smart Citation

Sobre auto-aprendizado de representações para realce da voz 3D.

Guimarães

View full text Add to dashboard Cite

the only people for me are the mad ones, the ones who are mad to live, mad to talk, mad to be saved, desirous of everything at the same time, the ones who never yawn or say a commonplace thing, but burn, burn, burn like fabulous yellow roman candles exploding like spiders across the stars and in the middle you see the blue centerlight pop and everybody goes "Awww!" --Jack Kerouac, On the Road RESUMO Métodos baseados em redes neurais profundas ganharam uma grande importância ao se mostrarem alternativas viáveis e poderosas para diversas tarefas, em especial para tarefas de processamento da voz, como reconhecimento de fala, detecção de palavras-chaves e reconhecimento de emoções. Entretanto esses métodos possuem alguns problemas intrínsecos, especialmente no que tange à robustez na presença de fatores deletérios, como ruídos e reverberação. Neste trabalho abordamos o problema de realce da voz, que tem como objetivo ser um sistema de pré-processamento capaz de realçar as características da voz e suprimir ruídos. Algoritmos baseados em modelos estatísticos abordam isto como um problema de maximização de verossimilhança. No entanto, não há garantias de que melhorará características perceptivas, como a inteligibilidade. Estudamos o uso de representações de fala extraídas do modelo wav2vec como função de custo perceptiva para a tarefa de realce da voz. Nossos experimentos demonstram que o uso de modelos de aprendizado contrastivo em funções de custo, para levar em conta características perceptivas, pode melhorar o desempenho do aprimoramento de fala em ambientes 3D. Além disso, discutimos o uso de modelos no domínio do tempo e do tempo-frequência. Nossos melhores resultados são obtidos através de modelos tempo-frequência, em detrimento do custo computacional.

show abstract

L3DAS21 Challenge: Machine Learning for 3D Audio Signal Processing

Cited by 2 publications

References 21 publications

A Survey of Sound Source Localization with Deep Learning Methods

A Survey of Sound Source Localization with Deep Learning Methods

Sobre auto-aprendizado de representações para realce da voz 3D.

Contact Info

Product

Resources

About