The current study examines how adequate coordination among different cognitive processes including visual recognition, attention switching, action preparation and generation can be developed via learning of robots by introducing a novel model, the Visuo-Motor Deep Dynamic Neural Network (VMDNN). The proposed model is built on coupling of a dynamic vision network, a motor generation network, and a higher level network allocated on top of these two. The simulation experiments using the iCub simulator were conducted for cognitive tasks including visual object manipulation responding to human gestures. The results showed that "synergetic" coordination can be developed via iterative learning through the whole network when spatio-temporal hierarchy and temporal one can be self-organized in the visual pathway and in the motor pathway, respectively, such that the higher level can manipulate them with abstraction.
Datasets are an essential component for training effective machine learning models. In particular, surgical robotic datasets have been key to many advances in semiautonomous surgeries, skill assessment, and training. Simulated surgical environments can enhance the data collection process by making it faster, simpler and cheaper than real systems. In addition, combining data from multiple robotic domains can provide rich and diverse training data for transfer learning algorithms. In this paper, we present the DESK (Dexterous Surgical Skill) dataset. It comprises a set of surgical robotic skills collected during a surgical training task using three robotic platforms: the Taurus II robot, Taurus II simulated robot, and the YuMi robot. This dataset was used to test the idea of transferring knowledge across different domains (e.g. from Taurus to YuMi robot) for a surgical gesture classification task with seven gestures. We explored three different scenarios: 1) No transfer, 2) Transfer from simulated Taurus to real Taurus and 3) Transfer from Simulated Taurus to the YuMi robot. We conducted extensive experiments with three supervised learning models and provided baselines in each of these scenarios. Results show that using simulation data during training enhances the performance on the real robot where limited real data is available. In particular, we obtained an accuracy of 55% on the real Taurus data using a model that is trained only on the simulator data. Furthermore, we achieved an accuracy improvement of 34% when 3% of the real data is added into the training process.
ObjectiveGestural interfaces allow accessing and manipulating Electronic Medical Records (EMR) in hospitals while keeping a complete sterile environment. Particularly, in the Operating Room (OR), these interfaces enable surgeons to browse Picture Archiving and Communication System (PACS) without the need of delegating functions to the surgical staff. Existing gesture based medical interfaces rely on a suboptimal and an arbitrary small set of gestures that are mapped to a few commands available in PACS software. The objective of this work is to discuss a method to determine the most suitable set of gestures based on surgeon’s acceptability. To achieve this goal, the paper introduces two key innovations: (a) a novel methodology to incorporate gestures’ semantic properties into the agreement analysis, and (b) a new agreement metric to determine the most suitable gesture set for a PACS.Materials and methodsThree neurosurgical diagnostic tasks were conducted by nine neurosurgeons. The set of commands and gesture lexicons were determined using a Wizard of Oz paradigm. The gestures were decomposed into a set of 55 semantic properties based on the motion trajectory, orientation and pose of the surgeons’ hands and their ground truth values were manually annotated. Finally, a new agreement metric was developed, using the known Jaccard similarity to measure consensus between users over a gesture set.ResultsA set of 34 PACS commands were found to be a sufficient number of actions for PACS manipulation. In addition, it was found that there is a level of agreement of 0.29 among the surgeons over the gestures found. Two statistical tests including paired t-test and Mann Whitney Wilcoxon test were conducted between the proposed metric and the traditional agreement metric. It was found that the agreement values computed using the former metric are significantly higher (p < 0.001) for both tests.ConclusionsThis study reveals that the level of agreement among surgeons over the best gestures for PACS operation is higher than the previously reported metric (0.29 vs 0.13). This observation is based on the fact that the agreement focuses on main features of the gestures rather than the gestures themselves. The level of agreement is not very high, yet indicates a majority preference, and is better than using gestures based on authoritarian or arbitrary approaches. The methods described in this paper provide a guiding framework for the design of future gesture based PACS systems for the OR.
The choice of what gestures should be part of a gesture language is a critical step in the design of gesturebased interfaces. This step is especially important when time and accuracy are key factors of the user experience, such as gestural interfaces in vehicle control and sterile control of a picture archiving and communication system (PACS) in the operating room (OR). Agreement studies are commonly used to find the gesture preference of the end users. These studies hypothesize that the best available gesture lexicon is the one preferred by a majority. However, these agreement approaches cannot offer a metric to assess the qualitative aspects of gestures. In this work, we propose an experimental framework to quantify, compare and evaluate gestures. This framework is grounded in the expert knowledge of speech and language professionals (SLPs). The development consisted of three studies: 1) Creation, 2) Evaluation and 3) Validation. In the creation study, we followed an adapted version of the Delphi’s interview/discussion procedure with SLPs. The purpose was to obtain the Vocabulary Acceptability Criteria (VAC) to evaluate gestures. Next, in the evaluation study, a modified method of pairwise comparisons was used to rank and quantify the gestures based on each criteria (VAC). Lastly, in the validation study, we formulated an odd one out procedure, to prove that the VAC values of a gesture are representative and sufficiently distinctive, to select that particular gesture from a pool of gestures. We applied this framework to the gestures obtained from a gesture elicitation study conducted with nine neurosurgeons, to control an imaging software. In addition, 29 SLPs comprising of 17 experts and 12 graduate students participated in the VAC study. The best lexicons from the available pool were obtained through both agreement and VAC metrics. We used binomial tests to show that the results obtained from the validation procedure are significantly better than the baseline. These results verify our hypothesis that the VAC are representative of the gestures and the subjects should be able to select the right gesture given its VAC values.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.