In this research, a study of cross-linguistic speech emotion recognition is performed. For this purpose, emotional data of different languages (English, Lithuanian, German, Spanish, Serbian, and Polish) are collected, resulting in a cross-linguistic speech emotion dataset with the size of more than 10.000 emotional utterances. Despite the bi-modal character of the databases gathered, our focus is on the acoustic representation only. The assumption is that the speech audio signal carries sufficient emotional information to detect and retrieve it. Several two-dimensional acoustic feature spaces, such as cochleagrams, spectrograms, mel-cepstrograms, and fractal dimension-based space, are employed as the representations of speech emotional features. A convolutional neural network (CNN) is used as a classifier. The results show the superiority of cochleagrams over other feature spaces utilized. In the CNN-based speaker-independent cross-linguistic speech emotion recognition (SER) experiment, the accuracy of over 90% is achieved, which is close to the monolingual case of SER.
IntroductionRecurrent laryngeal nerve injury is one of the major complications related to thyroid surgery. Intraoperative recurrent laryngeal nerve functional status monitoring is becoming a standard part of thyroid surgery. However, the current methods for intraoperative nerve functional status assessment are associated with a demand for specialized devices and increased costs.AimTo assess the validity of a new method – intraoperative laryngeal ultrasonography – for prediction of recurrent laryngeal nerve injury.Material and methodsThis prospective diagnostic test accuracy study included 112 patients undergoing thyroid surgery in Vilnius University Hospital Santaros Clinics. Neurostimulation combined with laryngeal ultrasonography and laryngeal palpation was performed intraoperatively to evaluate recurrent laryngeal nerve functional status. Recurrent laryngeal nerve injury was confirmed by laryngoscopy, which was performed on the first postoperative day and considered to be the gold standard method.ResultsData on 112 consecutive patients and 200 nerves at risk were collected. The temporary vocal cord palsy rate was 5.4% per patient and 3% per nerve at risk. No permanent palsy or bilateral injury cases were registered in the study cohort. Laryngeal ultrasound sensitivity counted per nerve at risk was 83.3%, specificity 97.2%, accuracy 96.4%, positive predictive value 62.5% and negative predictive value 99%.ConclusionsLaryngeal ultrasonography is a feasible new technique for accurate intraoperative recurrent laryngeal nerve injury evaluation.
The automated identification system of vessel movements receives a huge amount of multivariate, heterogeneous sensor data, which should be analyzed to make a proper and timely decision on vessel movements. The large number of vessels makes it difficult and time-consuming to detect abnormalities, thus rapid response algorithms should be developed for a decision support system to identify abnormal movements of vessels in areas of heavy traffic. This paper extends the previous study on a self-organizing map application for processing of sensor stream data received by the maritime automated identification system. The more data about the vessel’s movement is registered and submitted to the algorithm, the higher the accuracy of the algorithm should be. However, the task cannot be guaranteed without using an effective retraining strategy with respect to precision and data processing time. In addition, retraining ensures the integration of the latest vessel movement data, which reflects the actual conditions and context. With a view to maintaining the quality of the results of the algorithm, data batching strategies for the neural network retraining to detect anomalies in streaming maritime traffic data were investigated. The effectiveness of strategies in terms of modeling precision and the data processing time were estimated on real sensor data. The obtained results show that the neural network retraining time can be shortened by half while the sensitivity and precision only change slightly.
During the last 10–20 years, a great deal of new ideas have been proposed to improve the accuracy of speech emotion recognition: e.g., effective feature sets, complex classification schemes, and multi-modal data acquisition. Nevertheless, speech emotion recognition is still the task in limited success. Considering the nonlinear and fluctuating nature of the emotional speech, in this paper, we present fractal dimension-based features for speech emotion classification. We employed Katz, Castiglioni, Higuchi, and Hurst exponent-based features and their statistical functionals to establish the 224-dimensional full feature set. The dimension was downsized by applying the Sequential Forward Selection technique. The results of experimental study show a clear superiority of fractal dimension-based feature sets against the acoustic ones. The average accuracy of 96.5% was obtained using the reduced feature sets. The feature selection enabled us to obtain the 4-dimensional and 8-dimensional sets for Lithuanian and German emotions, respectively.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.