Acoustic Geo-Sensing: Recognising cyclists' route, route direction, and route progress from cell-phone audio

Schuller, Björn; Pokorny, Florian B.; Ladstätter, Stefan; Fellner, Maria; Graf, Franz; Paletta, Lucas

doi:10.1109/icassp.2013.6637688

Cited by 9 publications

(5 citation statements)

References 18 publications

(12 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Music recordings are instrumental and classical music from the last.fm website. The city recordings were recorded in Munich, Germany [22]. We use a strict, disjoint training/test split of the noise samples.…”

Section: Noisy Tum Avic Corpusmentioning

confidence: 99%

Single-channel speech separation with memory-enhanced recurrent neural networks

Weninger

Eyben

Schuller

2014

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Self Cite

108

View full text Add to dashboard Cite

In this paper we propose the use of Long Short-Term Memory recurrent neural networks for speech enhancement. Networks are trained to predict clean speech as well as noise features from noisy speech features, and a magnitude domain soft mask is constructed from these features. Extensive tests are run on 73 k noisy and reverberated utterances from the Audio-Visual Interest Corpus of spontaneous, emotionally colored speech, degraded by several hours of real noise recordings comprising stationary and non-stationary sources and convolutive noise from the Aachen Room Impulse Response database. In the result, the proposed method is shown to provide superior noise reduction at low signal-to-noise ratios while creating very little artifacts at higher signal-to-noise ratios, thereby outperforming unsupervised magnitude domain spectral subtraction by a large margin in terms of source-distortion ratio.

show abstract

Section: Noisy Tum Avic Corpusmentioning

confidence: 99%

Single-channel speech separation with memory-enhanced recurrent neural networks

Weninger

Eyben

Schuller

2014

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Self Cite

108

View full text Add to dashboard Cite

show abstract

“…Music recordings are instrumental and classical music from the last.fm website. The city recordings were recorded in Munich, Germany with smartphones while cycling and walking through the city similar to the task described in [22]. The noise samples for the training and test sets are fully disjunctive, i.e., no original sample occurrs in both sets.…”

Section: Noise and Reverberationmentioning

confidence: 99%

Affect recognition in real-life acoustic conditions — a new perspective on feature selection

2013

View full text Add to dashboard Cite

Automatic emotion recognition and computational paralinguistics have matured to some robustness under controlled laboratory settings, however, the accuracies are degraded in real-life conditions such as the presence of noise and reverberation. In this paper we take a look at the relevance of acoustic features for expression of valence, arousal, and interest conveyed by a speaker's voice. Experiments are conducted on the GEMEP and TUM AVIC databases. To simulate realistically degraded conditions the audio is corrupted with real room impulse responses and real-life noise recordings. Features well correlated with the target (emotion) over a wide range of acoustic conditions are analysed and an interpretation is given. Classification results in matched and mismatched settings with multi-condition training are provided to validate the benefit of the feature selection method. Our proposed way of selecting features over a range of noise types considerably boosts the generalisation ability of the classifiers.

show abstract

“…To record additional audio data without particularly demanding hardware conditions, an Android smart phone Samsung Galaxy Nexus was used similar as in [8]: The phone was loosely located in a front-pocket of a shirt worn by the subjects. The standard media recording APIs of Android use an AMR codec with poor quality.…”

Section: Recording Devicesmentioning

confidence: 99%

The acoustics of eye contact

Eyben

Weninger

Paletta

et al. 2013

Proceedings of the 6th Workshop on Eye Gaze in Intelligent Human Machine Interaction: Gaze in Multimodal Interaction

Self Cite

View full text Add to dashboard Cite

An important aspect in short dialogues is attention as is manifested by eye-contact between subjects. In this study we provide a first analysis whether such visual attention is evident in the acoustic properties of a speaker's voice. We thereby introduce the multi-modal GRAS 2 corpus, which was recorded for analysing attention in human-to-human interactions of short daily-life interactions with strangers in public places in Graz, Austria. Recordings of four test subjects equipped with eye tracking glasses, three audio recording devices, and motion sensors are contained in the corpus. We describe how we robustly identify speech segments from the subjects and other people in an unsupervised manner from multi-channel recordings. We then discuss correlations between the acoustics of the voice in these segments and the point of visual attention of the subjects. A significant relation between the acoustic features and the distance between the point of view and the eye region of the dialogue partner is found. Further, we show that automatic classification of binary decision eye-contact vs. no eye-contact from acoustic features alone is feasible with an Unweighted Average Recall of up to 70%.

show abstract

Acoustic Geo-Sensing: Recognising cyclists' route, route direction, and route progress from cell-phone audio

Cited by 9 publications

References 18 publications

Single-channel speech separation with memory-enhanced recurrent neural networks

Single-channel speech separation with memory-enhanced recurrent neural networks

Affect recognition in real-life acoustic conditions — a new perspective on feature selection

The acoustics of eye contact

Contact Info

Product

Resources

About