D. Arsic scite author profile

Automatic discrimination of musical signal types as speech, singing, music, genres or drumbeats within audio streams is of great importance e.g. for radio broadcast stream segmentation. Yet, feature sets are largely discussed. We therefore suggest a large open feature set approach starting with systematical generation of 7k hi-level features based on MPEG-7 Low-Level-Descriptors and further feature contours. A subsequent fast Gain Ratio reduction followed by wrapper-based Floating Search leads to a strong basis of relevant features. Next, features are added by alteration and combination within genetic search. For classification we use Support-Vector-Machines proven reliable for this task. Test-runs are carried out on two task-specific databases and the public Columbia SMD database and show significant improvements for each step of the suggested novel concept.

show abstract

PROMETHEUS database: A multimodal corpus for research on modeling and interpreting human behavior

Ntalampiras

Arsic

Stormer

et al. 2009

View full text Add to dashboard Cite

PROMETHEUS: heterogeneous sensor database in support of research on human behavioral patterns in unrestricted environments

Ntalampiras

Arsic

Hofmann

et al. 2012

SIViP

View full text Add to dashboard Cite

The multi-modal multi-sensor PROMETHEUS database was created in support of research and development activities [PROMETHEUS (FP7-ICT-214901): http://www.prometheus-FP7.eu] aiming at the creation of a framework for monitoring and interpretation of human behaviors in unrestricted indoor and outdoor environments. The distinctiveness of the PROMETHEUS database comes from the unique sensor sets, used in the various recording scenarios, but also from the database design, which covers a range of real-world applications, correlated to smart-home automation and indoors/outdoors surveillance of public areas. Numerous single-person and multi-person scenarios, but also scenarios with interactions between groups of people, motivated by these applications were implemented with the help of skilled actors and supernumerary personnel. In these scenarios, the actors and personnel were instructed to implement a range of typical and atypical behaviors, and simulations of emergency and crisis situations. In summary, the database contains more than 4 h of synchronized recordings from heterogeneous sensors (an infrared motion detection sensor, thermal imaging cameras, overview/surveillance video cameras, close-view video cameras, a 3D camera, a stereoscopic camera, a general-purpose camcoder, microphone arrays, and motion capture equipment) collected in common setups, simulating smart-home environment, airport, and ATM security environment. Selected scenes of the database were annotated for the needs of human detection and tracking. The entire audio part of the database was annotated for the needs of sound event detection, sound source enumeration, emotion recognition, etc

show abstract

Combining speech recognition and acoustic word emotion models for robust text-independent emotion recognition

Schuller¹,

Vlasenko

Arsic

et al. 2008

View full text Add to dashboard Cite

Recognition of emotion in speech usually uses acoustic models that ignore the spoken content. Likewise one general model per emotion is trained independent of the phonetic structure. Given sufficient data, this approach seemingly works well enough. Yet, this paper tries to answer the question whether acoustic emotion recognition strongly depends on phonetic content, and if models tailored for the spoken unit can lead to higher accuracies. We therefore investigate phoneme-, and word-models by use of a large prosodic, spectral, and voice quality feature space and Support Vector Machines (SVM). Experiments also take the necessity of ASR into account to select appropriate unitmodels. Test-runs on the well-known EMO-DB database facing speaker-independence demonstrate superiority of word emotion models over todays common general models provided sufficient occurrences in the training corpus.

show abstract

12 3 4 5

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

D. Arsic

Audiovisual Behavior Modeling by Combined Feature Spaces

Brute-forcing hierarchical functionals for paralinguistics: A waste of feature space?

Applying multi layer homography for multi camera person tracking

Feature Selection and Stacking for Robust Discrimination of Speech, Monophonic Singing, and Polyphonic Music

Musical Signal Type Discrimination based on Large Open Feature Sets

PROMETHEUS database: A multimodal corpus for research on modeling and interpreting human behavior

PROMETHEUS: heterogeneous sensor database in support of research on human behavioral patterns in unrestricted environments

Combining speech recognition and acoustic word emotion models for robust text-independent emotion recognition

Contact Info

Product

Resources

About