Keyword spotting (KWS) is an important technique for speech applications, which enables users to activate devices by speaking a keyword phrase. Although a phoneme classifier can be used for KWS, exploiting a large amount of transcribed data for automatic speech recognition (ASR), there is a mismatch between the training criterion (phoneme recognition) and the target task (KWS). Recently, multi-task learning has been applied to KWS to exploit both ASR and KWS training data. In this approach, an output of an acoustic model is split into two branches for the two tasks, one for phoneme transcription trained with the ASR data and one for keyword classification trained with the KWS data. In this paper, we introduce a cross attention decoder in the multitask learning framework. Unlike the conventional multi-task learning approach with the simple split of the output layer, the cross attention decoder summarizes information from a phonetic encoder by performing cross attention between the encoder outputs and a trainable query sequence to predict a confidence score for the KWS task. Experimental results on KWS tasks show that the proposed approach outperformed the conventional multi-task learning with split branches and a bi-directional long short-team memory decoder by 12% on average.
Reinforcement learning is an unsupervised learning algorithm, where learning is based upon feedback from the environment. Prior research has proposed cognitive (e.g., Instance-based Learning or IBL) and statistical (Q-learning) reinforcement learning algorithms. However, an evaluation of these algorithms in a single dynamic environment has not been explored. In this paper, a comparison between the statistical Qlearning algorithm and the cognitive IBL algorithm is presented. A well-known environment, "Frozen Lake," is used to train, generalize, and scale Q-learning and IBL algorithms. For generalizing, the Q-learning and IBL agents were trained on one version of the Frozen Lake and tested on a permuted version of the same environment. For scaling, the two algorithms were tested on a larger version of the Frozen Lake environment. Results revealed that the IBL algorithm used less training time and generalized better to different environment variants. The IBL algorithm was also able to show scalability by retaining its superior performance in the larger environment. These results indicate that the IBL algorithm could be proposed as an alternative to the standard reinforcement learning algorithms based on dynamic programming such as Q-learning. The inclusion of human factors (such as memory) in the IBL algorithm makes it suitable for robust learning in complex and dynamic environments.
Cognitive workload is a crucial factor in tasks involving dynamic decision-making and other real-time and high-risk situations. Neuroimaging techniques have long been used for estimating cognitive workload. Given the portability, cost-effectiveness and high time-resolution of EEG as compared to fMRI and other neuroimaging modalities, an efficient method of estimating an individual’s workload using EEG is of paramount importance. Multiple cognitive, psychiatric and behavioral phenotypes have already been known to be linked with “functional connectivity”, i.e., correlations between different brain regions. In this work, we explored the possibility of using different model-free functional connectivity metrics along with deep learning in order to efficiently classify the cognitive workload of the participants. To this end, 64-channel EEG data of 19 participants were collected while they were doing the traditional n-back task. These data (after pre-processing) were used to extract the functional connectivity features, namely Phase Transfer Entropy (PTE), Mutual Information (MI) and Phase Locking Value (PLV). These three were chosen to do a comprehensive comparison of directed and non-directed model-free functional connectivity metrics (allows faster computations). Using these features, three deep learning classifiers, namely CNN, LSTM and Conv-LSTM were used for classifying the cognitive workload as low (1-back), medium (2-back) or high (3-back). With the high inter-subject variability in EEG and cognitive workload and recent research highlighting that EEG-based functional connectivity metrics are subject-specific, subject-specific classifiers were used. Results show the state-of-the-art multi-class classification accuracy with the combination of MI with CNN at 80.87%, followed by the combination of PLV with CNN (at 75.88%) and MI with LSTM (at 71.87%). The highest subject specific performance was achieved by the combinations of PLV with Conv-LSTM, and PLV with CNN with an accuracy of 97.92%, followed by the combination of MI with CNN (at 95.83%) and MI with Conv-LSTM (at 93.75%). The results highlight the efficacy of the combination of EEG-based model-free functional connectivity metrics and deep learning in order to classify cognitive workload. The work can further be extended to explore the possibility of classifying cognitive workload in real-time, dynamic and complex real-world scenarios.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.