Search citation statements
Paper Sections
Citation Types
Year Published
Publication Types
Relationship
Authors
Journals
Automatic prediction of team performance and workload plays a crucial role in team selection, training, evaluation, and re-training processes. This study investigated the potential of using voice analysis of team-based communication for predicting team workload (TW) and team performance (TP). Both the TW and TP categories were labeled objectively. Ten teams of three participants were tasked with completing a computer-based command-and-control simulation that required communication of task-specific information to each team member. Recordings of each participant's voice communications were used to train Convolution Neural Network (CNN) models for each team separately. It was hypothesized that integrating TW and TP information into the prediction process would support the prediction of both TW and TP categories. Two experiments were conducted. In the first experiment, the TP prediction networks were fine-tuned to predict TW, and conversely, the TW prediction networks were fine-tuned to predict TP. In the second experiment, the TP or TW prediction based on the assembly of interconnected TP and TW classifiers was tested. Both experiments confirmed the hypothesis. It was shown that task-related pre-requisite knowledge embedded into the neural network reduced neural network model training time and improved performance without the need to increase the training data size. Predictions based on combined TW and TP classification outcomes -using either separate or interconnected TW or TP classifiers -outperformed the baseline method using a single CNN model trained to predict either TW or TP alone. The classification accuracy was consistent with previously reported cognitive load prediction based on objective measures.
Automatic prediction of team performance and workload plays a crucial role in team selection, training, evaluation, and re-training processes. This study investigated the potential of using voice analysis of team-based communication for predicting team workload (TW) and team performance (TP). Both the TW and TP categories were labeled objectively. Ten teams of three participants were tasked with completing a computer-based command-and-control simulation that required communication of task-specific information to each team member. Recordings of each participant's voice communications were used to train Convolution Neural Network (CNN) models for each team separately. It was hypothesized that integrating TW and TP information into the prediction process would support the prediction of both TW and TP categories. Two experiments were conducted. In the first experiment, the TP prediction networks were fine-tuned to predict TW, and conversely, the TW prediction networks were fine-tuned to predict TP. In the second experiment, the TP or TW prediction based on the assembly of interconnected TP and TW classifiers was tested. Both experiments confirmed the hypothesis. It was shown that task-related pre-requisite knowledge embedded into the neural network reduced neural network model training time and improved performance without the need to increase the training data size. Predictions based on combined TW and TP classification outcomes -using either separate or interconnected TW or TP classifiers -outperformed the baseline method using a single CNN model trained to predict either TW or TP alone. The classification accuracy was consistent with previously reported cognitive load prediction based on objective measures.
This paper explores the automatic prediction of public trust in politicians through the use of speech, text, and visual modalities. It evaluates the effectiveness of each modality individually, and it investigates fusion approaches for integrating information from each modality for prediction using a multimodal setting. A database was created consisting of speech recordings, twitter messages, and images representing fifteen American politicians, and labeling was carried out per a publicly available ranking system. The data were distributed into three trust categories, i.e., the low-trust category, mid-trust category, and high-trust category. First, unimodal prediction using each of the three modalities individually was performed using the database; then, using the outputs of the unimodal predictions, a multimodal prediction was later performed. Unimodal prediction was performed by training three independent logistic regression (LR) classifiers, one each for speech, text, and images. The prediction vectors from the individual modalities were then concatenated before being used to train a multimodal decision-making LR classifier. We report that the best performing modality was speech, which achieved a classification accuracy of 92.81%, followed by the images, achieving an accuracy of 77.96%, whereas the best performing model for text-modality achieved a 72.26% accuracy. With the multimodal approach, the highest classification accuracy of 97.53% was obtained when all three modalities were used for trust prediction. Meanwhile, in a bimodal setup, the best performing combination was that combining the speech and image visual modalities by achieving an accuracy of 95.07%, followed by the speech and text combination, showing an accuracy of 94.40%, whereas the text and images visual modal combination resulted in an accuracy of 83.20%.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.