Building on the success of the ADReSS Challenge at Interspeech 2020, which attracted the participation of 34 teams from across the world, the ADReSSo Challenge targets three difficult automatic prediction problems of societal and medical relevance, namely: detection of Alzheimer's Dementia, inference of cognitive testing scores, and prediction of cognitive decline. This paper presents these prediction tasks in detail, describes the datasets used, and reports the results of the baseline classification and regression models we developed for each task. A combination of acoustic and linguistic features extracted directly from audio recordings, without human intervention, yielded a baseline accuracy of 78.87% for the AD classification task, an MMSE prediction root mean squared (RMSE) error of 5.28, and 68.75% accuracy for the cognitive decline prediction task.
The results of a comparison between three different speech types-On-Talk, speaking to a computer, Off-Talk Self , speaking to oneself and Off-Talk Other, speaking to another person-uttered by subjects in a collaborative interlingual task mediated by an automatic speech-to-speech translation system, are reported here. The characteristics of the three speech types show significant differences in terms of speech rate (F2,2719 = 101.7; p < 2e − 16), and for this reason a detection method was implemented to see if they could also be detected with good accuracy based on their acoustic and biological characteristics. Acoustic and biological measures provide good results in distinguish between On-Talk and Off-Talk, but have difficulty distinguishing the sub-criteria of Off-Talk: Self and Other.
Background: Advances in machine learning (ML) technology have opened new avenues for detection and monitoring of cognitive decline. In this study, a multimodal approach to Alzheimer's dementia detection based on the patient's spontaneous speech is presented. This approach was tested on a standard, publicly available Alzheimer's speech dataset for comparability. The data comprise voice samples from 156 participants (1:1 ratio of Alzheimer's to control), matched by age and gender.Materials and Methods: A recently developed Active Data Representation (ADR) technique for voice processing was employed as a framework for fusion of acoustic and textual features at sentence and word level. Temporal aspects of textual features were investigated in conjunction with acoustic features in order to shed light on the temporal interplay between paralinguistic (acoustic) and linguistic (textual) aspects of Alzheimer's speech. Combinations between several configurations of ADR features and more traditional bag-of-n-grams approaches were used in an ensemble of classifiers built and evaluated on a standardised dataset containing recorded speech of scene descriptions and textual transcripts.Results: Employing only semantic bag-of-n-grams features, an accuracy of 89.58% was achieved in distinguishing between Alzheimer's patients and healthy controls. Adding temporal and structural information by combining bag-of-n-grams features with ADR audio/textual features, the accuracy could be improved to 91.67% on the test set. An accuracy of 93.75% was achieved through late fusion of the three best feature configurations, which corresponds to a 4.7% improvement over the best result reported in the literature for this dataset.Conclusion: The proposed combination of ADR audio and textual features is capable of successfully modelling temporal aspects of the data. The machine learning approach toward dementia detection achieves best performance when ADR features are combined with strong semantic bag-of-n-grams features. This combination leads to state-of-the-art performance on the AD classification task.
Building on the success of the ADReSS Challenge at Interspeech 2020, which attracted the participation of 34 teams from across the world, the ADReSSo Challenge targets three difficult automatic prediction problems of societal and medical relevance, namely: detection of Alzheimer's Dementia, inference of cognitive testing scores, and prediction of cognitive decline. This paper presents these prediction tasks in detail, describes the datasets used, and reports the results of the baseline classification and regression models we developed for each task. A combination of acoustic and linguistic features extracted directly from audio recordings, without human intervention, yielded a baseline accuracy of 78.87% for the AD classification task, an MMSE prediction root mean squared (RMSE) error of 5.28, and 68.75% accuracy for the cognitive decline prediction task.
Access to performance data during matches and training sessions is important for coaches and players. Although there are many video tagging systems available which can provide such access, these systems require manual effort. Data from Inertial Measurement Units (IMU) could be used for automatically tagging video recordings in terms of players’ actions. However, the data gathered during volleyball sessions are generally very imbalanced, since for an individual player most time intervals can be classified as “non-actions” rather than “actions”. This makes automatic annotation of video recordings of volleyball matches a challenging machine-learning problem. To address this problem, we evaluated balanced and imbalanced learning methods with our newly proposed ‘super-bagging’ method for volleyball action modelling. All methods are evaluated using six classifiers and four sensors (i.e., accelerometer, magnetometer, gyroscope and barometer). We demonstrate that imbalanced learning provides better unweighted average recall, (UAR = 83.99%) for the non-dominant hand using a naive Bayes classifier than balanced learning, while balanced learning provides better performance (UAR = 84.18%) for the dominant hand using a tree bagger classifier than imbalanced learning. Our super-bagging method provides the best UAR (84.19%). It is also noted that the super-bagging method provides better averaged UAR than balanced and imbalanced methods in 8 out of 10 cases, hence demonstrating the potential of the super-bagging method for IMU’s sensor data. One of the potential applications of these novel models is fatigue and stamina estimation e.g., by keeping track of how many actions a player is performing and when these are being performed.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.