Acoustic event detection and scene classification are major research tasks in environmental sound analysis, and many methods based on neural networks have been proposed. Conventional methods have addressed these tasks separately; however, acoustic events and scenes are closely related to each other. For example, in the acoustic scene "office", the acoustic events "mouse clicking" and "keyboard typing" are likely to occur. In this paper, we propose multitask learning for joint analysis of acoustic events and scenes, which shares the parts of the networks holding information on acoustic events and scenes in common. By integrating the two networks, we expect that information on acoustic scenes will improve the performance of acoustic event detection. Experimental results obtained using TUT Sound Events 2016/2017 and TUT Acoustic Scenes 2016 datasets indicate that the proposed method improves the performance of acoustic event detection by 10.66 percentage points in terms of the F-score, compared with a conventional method based on a convolutional recurrent neural network.
Recent renewed interest in computational writer identification has resulted in an increased number of publications. In relation to historical musicology its application has so far been limited. One of the obstacles seems to be that the clarity of the images from the scans available for computational analysis is often not sufficient. In this paper, the use of the Hinge feature is proposed to avoid segmentation and staff-line removal for effective feature extraction from low quality scans. The use of an auto encoder in Hinge feature space is suggested as an alternative to staff-line removal by image processing, and their performance is compared. The result of the experiment shows an accuracy of 87 % for the dataset containing 84 writers' samples, and superiority of our segmentation and staff-line removal free approach. Practical analysis on Bach's autograph manuscript of the Well-Tempered Clavier II (Additional MS. 35021 in the British Library, London) is also presented and the extensive applicability of our approach is demonstrated.
Historical musicologists have been seeking for objective and powerful techniques to collect, analyse and verify their findings for many decades. The aim of this study was to show the importance of such domain-specific problems to achieve actionable knowledge discovery in the real the world. Our focus is on finding evidence for the chronological ordering of J.S. Bach's manuscripts, by proposing a musicologist-driven mining method for extracting quantitative information from early music manuscripts. Bach's Cclefs were extracted from a wide range of manuscripts under the direction of domain experts, and with these the classification of C-clefs was conducted. The proposed methods were evaluated on a dataset containing over 1000 clefs extracted from J.S. Bach's manuscripts. The results show more than 70% accuracy for dating J.S. Bach's manuscripts. Dating of Bach's lost manuscripts was quantitatively hypothesized, providing a rough barometer to be combined with other evidence to evaluate musicologists' hypotheses, and the practicability of this domain-driven approach is demonstrated.
This paper addresses negative emotion recognition using paralinguistic information in speech for speech dialogue system. Speech conveys not only linguistic information but also paralinguistic and non-linguistic information such as the emotions, attitudes, and intentions. This easily perceivable information plays a key role in a spoken dialog system. However, most of previous speech recognition systems fail to consider this significant information, focusing only on linguistic information, thus hindering the development of more natural speech dialog systems. In order to utilize these significant information for speech dialog systems, this paper focuses on negative emotion recognition from Japanese utterances. 6552-dimensional acoustic features were extracted from 6300 Japanese utterances of 50 people in three emotional state: negative; positive; and neutral. Negative emotion includes anger, sad and dislike. While positive emotion includes favor, joy, and relief. They were classified by SVM and evaluated by a 10-fold cross validation. The experimental result showed the recognition rate of 93.4 % for the classification of negative and positive and 95.0% for the classification of negative and neutral.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.