An Enhanced Temporal Feature Integration Method for Environmental Sound Recognition

Bountourakis, Vasileios; Vrysis, Lazaros; Konstantoudakis, Konstantinos; Vryzas, Nikolaos

doi:10.3390/acoustics1020023

Cited by 17 publications

(18 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Context-and location-aware services can be combined with (multichannel) semantic processing to offer spatiotemporal sound mapping and pattern-related visualizations. Such featured summarization techniques are encountered on generic audio detection and classification tasks, including environmental sound recognition [14,[49][50][51][52][53][54][55][56][57][58][59]. In this view, crowdsourced audio data can offer soundscape enhancement with multiple augmentation layers in favor of documentation, data-driven storytelling, and management.…”

Section: Related Workmentioning

confidence: 99%

“…In this view, crowdsourced audio data can offer soundscape enhancement with multiple augmentation layers in favor of documentation, data-driven storytelling, and management. The massive research progress on the domain has established multiple pattern recognition schemes and hierarchical semantic audio taxonomies to describe the sound-fields associated with the different social events [13][14][15][16][17][18][19][20][21][22][23][24][52][53][54][55][56][57][58][59]. Apart from the geographical-and time-related information that a mobile terminal can easily hold, environmental sounds and soundscapes can be classified, filtered, and highlighted based on the associated pattern classification taxonomies, various low-level audio descriptors, other semantic labels concerning the transmitted or perceived emotions, etc.…”

Section: Related Workmentioning

confidence: 99%

“…Apart from the geographical-and time-related information that a mobile terminal can easily hold, environmental sounds and soundscapes can be classified, filtered, and highlighted based on the associated pattern classification taxonomies, various low-level audio descriptors, other semantic labels concerning the transmitted or perceived emotions, etc. [49][50][51][52][53][54][55][56]60,61]. Furthermore, recent audio and audiovisual captioning trends can offer additional semantic conceptualization meta-data [62][63][64][65].…”

Section: Related Workmentioning

confidence: 99%

“…A linked popular research topic that significantly propelled multidisciplinary scientific projects and associated knowledge gain is the way of learning by example, through the Machine and Deep Learning (ML/DL) paradigms. Both the audio semantics and the CH domains have also benefited the made breakthroughs and progress [4][5][6][7][13][14][15][16][52][53][54][55][56][57][58][59]. Hence, sound and acoustic scene recordings can be processed to provide event detection and recognition outcomes, offering pattern-related metadata, content-based description, and management automation (i.e., retrieval, summarization/highlighting, etc.).…”

Section: Related Workmentioning

confidence: 99%

See 3 more Smart Citations

Semantic Crowdsourcing of Soundscapes Heritage: A Mojo Model for Data-Driven Storytelling

et al. 2021

Self Cite

View full text Add to dashboard Cite

The current paper focuses on the development of an enhanced Mobile Journalism (MoJo) model for soundscape heritage crowdsourcing, data-driven storytelling, and management in the era of big data and the semantic web. Soundscapes and environmental sound semantics have a great impact on cultural heritage, also affecting the quality of human life, from multiple perspectives. In this view, context- and location-aware mobile services can be combined with state-of-the-art machine and deep learning approaches to offer multilevel semantic analysis monitoring of sound-related heritage. The targeted utilities can offer new insights toward sustainable growth of both urban and rural areas. Much emphasis is also put on the multimodal preservation and auralization of special soundscape areas and open ancient theaters with remarkable acoustic behavior, representing important cultural artifacts. For this purpose, a pervasive computing architecture is deployed and investigated, utilizing both client- and cloud-wise semantic analysis services, to implement and evaluate the envisioned MoJo methodology. Elaborating on previous/baseline MoJo tools, research hypotheses and questions are stated and put to test as part of the human-centered application design and development process. In this setting, primary algorithmic backend services on sound semantics are implemented and thoroughly validated, providing a convincing proof of concept of the proposed model.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

See 2 more Smart Citations

Semantic Crowdsourcing of Soundscapes Heritage: A Mojo Model for Data-Driven Storytelling

et al. 2021

Self Cite

View full text Add to dashboard Cite

show abstract

“…In this context, new audio recognition and semantic analysis techniques are deployed for General Audio Detection and Classification (GADC) tasks, which are very useful in many multidisciplinary domains [4][5][6][7][8][9][10][11][12][13][14][15][16]. Typical examples include speech recognition and perceptual enhancement [5][6][7][8], speaker indexing and diarization [14][15][16][17][18][19], voice/music detection and discrimination [1][2][3][4][9][10][11][12][13][20][21][22], information retrieval and genre classification of music [23,24], audio-driven alignment of multiple recordings [25,26], sound emotion recognition [27][28][29] and others [10,[30][31][32]. Concerning the media production and broadcasting domain, audio and audio-driven segmentation allow for the implementation of prope...…”

Section: Introductionmentioning

confidence: 99%

Investigation of Spoken-Language Detection and Classification in Broadcasted Audio Content

et al. 2020

View full text Add to dashboard Cite

The current paper focuses on the investigation of spoken-language classification in audio broadcasting content. The approach reflects a real-word scenario, encountered in modern media/monitoring organizations, where semi-automated indexing/documentation is deployed, which could be facilitated by the proposed language detection preprocessing. Multilingual audio recordings of specific radio streams are formed into a small dataset, which is used for the adaptive classification experiments, without seeking-at this step-for a generic language recognition model. Specifically, hierarchical discrimination schemes are followed to separate voice signals before classifying the spoken languages. Supervised and unsupervised machine learning is utilized at various windowing configurations to test the validity of our hypothesis. Besides the analysis of the achieved recognition scores (partial and overall), late integration models are proposed for semi-automatically annotation of new audio recordings. Hence, data augmentation mechanisms are offered, aiming at gradually formulating a Generic Audio Language Classification Repository. This database constitutes a program-adaptive collection that, beside the self-indexing metadata mechanisms, could facilitate generic language classification models in the future, through state-of-art techniques like deep learning. This approach matches the investigatory inception of the project, which seeks for indicators that could be applied in a second step with a larger dataset and/or an already pre-trained model, with the purpose to deliver overall results.

show abstract

Entropy Based Feature Pooling in Speech Command Classification

Nalmpantis

Vrysis

Vlachava

et al. 2021

Lecture Notes in Networks and Systems

View full text Add to dashboard Cite

An Enhanced Temporal Feature Integration Method for Environmental Sound Recognition

Cited by 17 publications

References 16 publications

Semantic Crowdsourcing of Soundscapes Heritage: A Mojo Model for Data-Driven Storytelling

Semantic Crowdsourcing of Soundscapes Heritage: A Mojo Model for Data-Driven Storytelling

Investigation of Spoken-Language Detection and Classification in Broadcasted Audio Content

Entropy Based Feature Pooling in Speech Command Classification

Contact Info

Product

Resources

About