Masahiro Niitsuma scite author profile

et al. 2019

Acoustic event detection and scene classification are major research tasks in environmental sound analysis, and many methods based on neural networks have been proposed. Conventional methods have addressed these tasks separately; however, acoustic events and scenes are closely related to each other. For example, in the acoustic scene "office", the acoustic events "mouse clicking" and "keyboard typing" are likely to occur. In this paper, we propose multitask learning for joint analysis of acoustic events and scenes, which shares the parts of the networks holding information on acoustic events and scenes in common. By integrating the two networks, we expect that information on acoustic scenes will improve the performance of acoustic event detection. Experimental results obtained using TUT Sound Events 2016/2017 and TUT Acoustic Scenes 2016 datasets indicate that the proposed method improves the performance of acoustic event detection by 10.66 percentage points in terms of the F-score, compared with a conventional method based on a convolutional recurrent neural network.

Musicologist-driven writer identification in early music manuscripts

Schomaker

Oosten

et al. 2015

Multimed Tools Appl

Recent renewed interest in computational writer identification has resulted in an increased number of publications. In relation to historical musicology its application has so far been limited. One of the obstacles seems to be that the clarity of the images from the scans available for computational analysis is often not sufficient. In this paper, the use of the Hinge feature is proposed to avoid segmentation and staff-line removal for effective feature extraction from low quality scans. The use of an auto encoder in Hinge feature space is suggested as an alternative to staff-line removal by image processing, and their performance is compared. The result of the experiment shows an accuracy of 87 % for the dataset containing 84 writers' samples, and superiority of our segmentation and staff-line removal free approach. Practical analysis on Bach's autograph manuscript of the Well-Tempered Clavier II (Additional MS. 35021 in the British Library, London) is also presented and the extensive applicability of our approach is demonstrated.

Towards Musicologist-Driven Mining of Handwritten Scores

et al. 2018

Historical musicologists have been seeking for objective and powerful techniques to collect, analyse and verify their findings for many decades. The aim of this study was to show the importance of such domain-specific problems to achieve actionable knowledge discovery in the real the world. Our focus is on finding evidence for the chronological ordering of J.S. Bach's manuscripts, by proposing a musicologist-driven mining method for extracting quantitative information from early music manuscripts. Bach's Cclefs were extracted from a wide range of manuscripts under the direction of domain experts, and with these the classification of C-clefs was conducted. The proposed methods were evaluated on a dataset containing over 1000 clefs extracted from J.S. Bach's manuscripts. The results show more than 70% accuracy for dating J.S. Bach's manuscripts. Dating of Bach's lost manuscripts was quantitatively hypothesized, providing a rough barometer to be combined with other evidence to evaluate musicologists' hypotheses, and the practicability of this domain-driven approach is demonstrated.

Writer Identification in Old Music Manuscripts Using Contour-Hinge Feature and Dimensionality Reduction with an Autoencoder

Schomaker

Oosten

et al. 2013

Automatic recognition of negative emotion in speech using support vector machine

Yamamoto

Yamashita

2016

This paper addresses negative emotion recognition using paralinguistic information in speech for speech dialogue system. Speech conveys not only linguistic information but also paralinguistic and non-linguistic information such as the emotions, attitudes, and intentions. This easily perceivable information plays a key role in a spoken dialog system. However, most of previous speech recognition systems fail to consider this significant information, focusing only on linguistic information, thus hindering the development of more natural speech dialog systems. In order to utilize these significant information for speech dialog systems, this paper focuses on negative emotion recognition from Japanese utterances. 6552-dimensional acoustic features were extracted from 6300 Japanese utterances of 50 people in three emotional state: negative; positive; and neutral. Negative emotion includes anger, sad and dislike. While positive emotion includes favor, joy, and relief. They were classified by SVM and evaluated by a 10-fold cross validation. The experimental result showed the recognition rate of 93.4 % for the classification of negative and positive and 95.0% for the classification of negative and neutral.