Spatial Audio Scene Characterization (SASC)

Zieliński, Sławomir

doi:10.1007/978-3-319-98678-4_46

Cryptology and Network Security

2018

DOI: 10.1007/978-3-319-98678-4_46

|View full text |Cite

Spatial Audio Scene Characterization (SASC)

Sławomir Zieliński

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

Supporting

Mentioning

Contrasting

Year Published

2019

2022

Publication Types

Select...

Article3

Relationship

Self Cite0

Independent3

Authors

Journals

Cited by 3 publications

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

A Comparison of Human against Machine-Classification of Spatial Audio Scenes in Binaural Recordings of Music

2020

View full text Add to dashboard Cite

The purpose of this paper is to compare the performance of human listeners against the selected machine learning algorithms in the task of the classification of spatial audio scenes in binaural recordings of music under practical conditions. The three scenes were subject to classification: (1) music ensemble (a group of musical sources) located in the front, (2) music ensemble located at the back, and (3) music ensemble distributed around a listener. In the listening test, undertaken remotely over the Internet, human listeners reached the classification accuracy of 42.5%. For the listeners who passed the post-screening test, the accuracy was greater, approaching 60%. The above classification task was also undertaken automatically using four machine learning algorithms: convolutional neural network, support vector machines, extreme gradient boosting framework, and logistic regression. The machine learning algorithms substantially outperformed human listeners, with the classification accuracy reaching 84%, when tested under the binaural-room-impulse-response (BRIR) matched conditions. However, when the algorithms were tested under the BRIR mismatched scenario, the accuracy obtained by the algorithms was comparable to that exhibited by the listeners who passed the post-screening test, implying that the machine learning algorithms capability to perform in unknown electro-acoustic conditions needs to be further improved.

show abstract

A Comparison of Human against Machine-Classification of Spatial Audio Scenes in Binaural Recordings of Music

2020

View full text Add to dashboard Cite

show abstract

Spatial Audio Scene Characterization (SASC): Automatic Localization of Front-, Back-, Up-, and Down-Positioned Music Ensembles in Binaural Recordings

Zieliński

Antoniuk

2022

Applied Sciences

View full text Add to dashboard Cite

The automatic localization of audio sources distributed symmetrically with respect to coronal or transverse planes using binaural signals still poses a challenging task, due to the front–back and up–down confusion effects. This paper demonstrates that the convolutional neural network (CNN) can be used to automatically localize music ensembles panned to the front, back, up, or down positions. The network was developed using the repository of the binaural excerpts obtained by the convolution of multi-track music recordings with the selected sets of head-related transfer functions (HRTFs). They were generated in such a way that a music ensemble (of circular shape in terms of its boundaries) was positioned in one of the following four locations with respect to the listener: front, back, up, and down. According to the obtained results, CNN identified the location of the ensembles with the average accuracy levels of 90.7% and 71.4% when tested under the HRTF-dependent and HRTF-independent conditions, respectively. For HRTF-dependent tests, the accuracy decreased monotonically with the increase in the ensemble size. A modified image occlusion sensitivity technique revealed selected frequency bands as being particularly important in terms of the localization process. These frequency bands are largely in accordance with the psychoacoustical literature.

show abstract

Automatic Spatial Audio Scene Classification in Binaural Recordings of Music

Zieliński

2019

Applied Sciences

View full text Add to dashboard Cite

The aim of the study was to develop a method for automatic classification of the three spatial audio scenes, differing in horizontal distribution of foreground and background audio content around a listener in binaurally rendered recordings of music. For the purpose of the study, audio recordings were synthesized using thirteen sets of binaural-room-impulse-responses (BRIRs), representing room acoustics of both semi-anechoic and reverberant venues. Head movements were not considered in the study. The proposed method was assumption-free with regards to the number and characteristics of the audio sources. A least absolute shrinkage and selection operator was employed as a classifier. According to the results, it is possible to automatically identify the spatial scenes using a combination of binaural and spectro-temporal features. The method exhibits a satisfactory classification accuracy when it is trained and then tested on different stimuli but synthesized using the same BRIRs (accuracy ranging from 74% to 98%), even in highly reverberant conditions. However, the generalizability of the method needs to be further improved. This study demonstrates that in addition to the binaural cues, the Mel-frequency cepstral coefficients constitute an important carrier of spatial information, imperative for the classification of spatial audio scenes.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Spatial Audio Scene Characterization (SASC)

Cited by 3 publications

References 15 publications

A Comparison of Human against Machine-Classification of Spatial Audio Scenes in Binaural Recordings of Music

A Comparison of Human against Machine-Classification of Spatial Audio Scenes in Binaural Recordings of Music

Spatial Audio Scene Characterization (SASC): Automatic Localization of Front-, Back-, Up-, and Down-Positioned Music Ensembles in Binaural Recordings

Automatic Spatial Audio Scene Classification in Binaural Recordings of Music

Contact Info

Product

Resources

About