Research in the field of medical image is an important part of the medical robot to operate human organs. A medical robot is the intersection of multi-disciplinary research fields, in which medical image is an important direction and has achieved fruitful results. In this paper, a method of soft tissue surface feature tracking based on a depth matching network is proposed. This method is described based on the triangular matching algorithm. First, we construct a self-made sample set for training the depth matching network from the first N frames of speckle matching data obtained by the triangle matching algorithm. The depth matching network is pre-trained on the ORL face data set and then trained on the self-made training set. After the training, the speckle matching is carried out in the subsequent frames to obtain the speckle matching matrix between the subsequent frames and the first frame. From this matrix, the inter-frame feature matching results can be obtained. In this way, the inter-frame speckle tracking is completed. On this basis, the results of this method are compared with the matching results based on the convolutional neural network. The experimental results show that the proposed method has higher matching accuracy. In particular, the accuracy of the MNIST handwritten data set has reached more than 90%.
A bilateral adaptive control method based on PEB control structure is designed for a class of time-delay force feedback teleoperation system without external interference and internal friction to study the uncertainty of dynamic parameters and time delay. The stability and tracking performances of the closed-loop constant time delay teleoperation system are analyzed by Lyapunov stability theory. Finally, the controller designed in this paper is successfully applied to the teleoperation system composed of a two-degree of freedom rotating manipulator as the master robot and the slave robot. The simulation is carried out in no operator and environment force or with operator and environment force. The adaptive bilateral control method's control performance is compared with that of the traditional time-delay teleoperation system. Finally, it is verified that the method has good control performance.
The Visual Question Answering (VQA) system is the process of finding useful information from images related to the question to answer the question correctly. It can be widely used in the fields of visual assistance, automated security surveillance, and intelligent interaction between robots and humans. However, the accuracy of VQA has not been ideal, and the main difficulty in its research is that the image features cannot well represent the scene and object information, and the text information cannot be fully represented. This paper used multi-scale feature extraction and fusion methods in the image feature characterization and text information representation sections of the VQA system, respectively to improve its accuracy. Firstly, aiming at the image feature representation problem, multi-scale feature extraction and fusion method were adopted, and the image features output of different network layers were extracted by a pre-trained deep neural network, and the optimal scheme of feature fusion method was found through experiments. Secondly, for the representation of sentences, a multi-scale feature method was introduced to characterize and fuse the word-level, phrase-level, and sentence-level features of sentences. Finally, the VQA model was improved using the multi-scale feature extraction and fusion method. The results show that the addition of multi-scale feature extraction and fusion improves the accuracy of the VQA model.
Visual Question Answering (VQA) is a significant cross-disciplinary issue in the fields of computer vision and natural language processing that requires a computer to output a natural language answer based on pictures and questions posed based on the pictures. This requires simultaneous processing of multimodal fusion of text features and visual features, and the key task that can ensure its success is the attention mechanism. Bringing in attention mechanisms makes it better to integrate text features and image features into a compact multi-modal representation. Therefore, it is necessary to clarify the development status of attention mechanism, understand the most advanced attention mechanism methods, and look forward to its future development direction. In this article, we first conduct a bibliometric analysis of the correlation through CiteSpace, then we find and reasonably speculate that the attention mechanism has great development potential in cross-modal retrieval. Secondly, we discuss the classification and application of existing attention mechanisms in VQA tasks, analysis their shortcomings, and summarize current improvement methods. Finally, through the continuous exploration of attention mechanisms, we believe that VQA will evolve in a smarter and more human direction.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.