Object detection is the most important problem in computer vision tasks. After AlexNet proposed, based on Convolutional Neural Network (CNN) methods have become mainstream in the computer vision field, many researches on neural networks and different transformations of algorithm structures have appeared. In order to achieve fast and accurate detection effects, it is necessary to jump out of the existing CNN framework and has great challenges. Transformer’s relatively mature theoretical support and technological development in the field of Natural Language Processing have brought it into the researcher’s sight, and it has been proved that Transformer’s method can be used for computer vision tasks, and proved that it exceeds the existing CNN method in some tasks. In order to enable more researchers to better understand the development process of object detection methods, existing methods, different frameworks, challenging problems and development trends, paper introduced historical classic methods of object detection used CNN, discusses the highlights, advantages and disadvantages of these algorithms. By consulting a large amount of paper, the paper compared different CNN detection methods and Transformer detection methods. Vertically under fair conditions, 13 different detection methods that have a broad impact on the field and are the most mainstream and promising are selected for comparison. The comparative data gives us confidence in the development of Transformer and the convergence between different methods. It also presents the recent innovative approaches to using Transformer in computer vision tasks. In the end, the challenges, opportunities and future prospects of this field are summarized.
To further improve the accuracy of multilingual off-line handwritten signature verification, this paper studies the off-line handwritten signature verification of monolingual and multilingual mixture and proposes an improved verification network (IDN), which adopts user-independent (WI) handwritten signature verification, to determine the true signature or false signature. The IDN model contains four neural network streams with shared weights, of which two receiving the original signature images are the discriminative streams, and the other two streams are the reverse stream of the gray inversion image. The enhanced spatial attention models connect the discriminative streams and reverse flow to realize message propagation. The IDN model uses the channel attention mechanism (SE) and the improved spatial attention module (ESA) to propose the effective feature information of signature verification. Since there is no suitable multilingual signature data set, this paper collects two language data sets (Chinese and Uyghur), including 100,000 signatures of 200 people. Our method is tested on the self-built data set and the public data sets of Bengali (BHsig-B) and Hindi (BHsig-H). The method proposed in this paper has the highest discrimination rate of FRR of 10.5%, FAR of 2.06%, and ACC of 96.33% for the mixture of two languages.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.