Research in multimodal interfaces aims to provide immersive solutions and to increase overall human performance. A promising direction is to combine auditory, visual and haptic interaction between the user and the simulated environment. However, no extensive comparison exists to show how combining audiovisuohaptic interfaces would affect human perception and by extent reflected on task performance. Our paper explores this idea and presents a thorough, full-factorial comparison of how all combinations of audio, visual and haptic interfaces affect performance during manipulation. We evaluated how each combination affects the performance in a study (N = 25) consisting of manipulation tasks with various difficulties. The overall performance was assessed using both subjective measures, by assessing cognitive workload and system usability, and objective measurements, by incorporating time and spatial accuracy-based metrics. The results showed that regardless of task complexity, the combination of stereoscopic-vision with the virtual reality headset increased performance across all measurements by 40%, compared to monocular-vision from a generic display monitor. Besides, using haptic feedback improved outcomes by 10% and auditory feedback accounted for approximately 5% improvement.
This article presents a whisper speech detector in the far-field domain. The proposed system consists of a long-short term memory (LSTM) neural network trained on log-filterbank energy (LFBE) acoustic features. This model is trained and evaluated on recordings of human interactions with voicecontrolled, far-field devices in whisper and normal phonation modes. We compare multiple inference approaches for utterance-level classification by examining trajectories of the LSTM posteriors. In addition, we engineer a set of features based on the signal characteristics inherent to whisper speech, and evaluate their effectiveness in further separating whisper from normal speech. A benchmarking of these features using multilayer perceptrons (MLP) and LSTMs suggests that the proposed features, in combination with LFBE features, can help us further improve our classifiers. We prove that, with enough data, the LSTM model is indeed as capable of learning whisper characteristics from LFBE features alone compared to a simpler MLP model that uses both LFBE and features engineered for separating whisper and normal speech. In addition, we prove that the LSTM classifiers accuracy can be further improved with the incorporation of the proposed engineered features.Index Termswhisper phonation, long-short term memory neural networks, whisper
Load disaggregation for the identification of specific load types in the total demands (e.g., demand-manageable loads, such as heating or cooling loads) is becoming increasingly important for the operation of existing and future power supply systems. This paper introduces an approach in which periodical changes in the total demands (e.g., daily, weekly, and seasonal variations) are disaggregated into corresponding frequency components and correlated with the same frequency components in the meteorological variables (e.g., temperature and solar irradiance), allowing to select combinations of frequency components with the strongest correlations as the additional explanatory variables. The paper first presents a novel Fourier series regression method for obtaining target frequency components, which is illustrated on two household-level datasets and one substation-level dataset. These results show that correlations between selected disaggregated frequency components are stronger than the correlations between the original non-disaggregated data. Afterwards, convolutional neural network (CNN) and bidirectional long short-term memory (BiLSTM) methods are used to represent dependencies among multiple dimensions and to output the estimated disaggregated time series of specific types of loads, where Bayesian optimisation is applied to select hyperparameters of CNN-BiLSTM model. The CNN-BiLSTM and other deep learning models are reported to have excellent performance in many regression problems, but they are often applied as “black box” models without further exploration or analysis of the modelled processes. Therefore, the paper compares CNN-BiLSTM model in which correlated frequency components are used as the additional explanatory variables with a naïve CNN-BiLSTM model (without frequency components). The presented case studies, related to the identification of electrical heating load and lighting load from the total demands, show that the accuracy of disaggregation improves after specific frequency components of the total demand are correlated with the corresponding frequency components of temperature and solar irradiance, i.e., that frequency component-based CNN-BiLSTM model provides a more accurate load disaggregation. Obtained results are also compared/benchmarked against the two other commonly used models, confirming the benefits of the presented load disaggregation methodology.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.