The relevance of the tasks of detecting and recognizing objects in images and their sequences has only increased over the years. Over the past few decades, a huge number of approaches and methods for detecting both anomalies, that is, image areas whose characteristics differ from the predicted ones, and objects of interest, about the properties of which there is a priori information, up to the library of standards, have been proposed. In this work, an attempt is made to systematically analyze trends in the development of approaches and detection methods, reasons behind these developments, as well as metrics designed to assess the quality and reliability of object detection. Detection techniques based on mathematical models of images are considered. At the same time, special attention is paid to the approaches based on models of random fields and likelihood ratios. The development of convolutional neural networks intended for solving the recognition problems is analyzed, including a number of pre-trained architectures that provide high efficiency in solving this problem. Rather than using mathematical models, such architectures are trained using libraries of real images. Among the characteristics of the detection quality assessment, probabilities of errors of the first and second kind, precision and recall of detection, intersection by union, and interpolated average precision are considered. The paper also presents typical tests that are used to compare various neural network algorithms.
Despite the great possibilities of modern neural network architectures concerning the problems of object detection and recognition, the output of such models is the local (pixel) coordinates of objects bounding boxes in the image and their predicted classes. However, in several practical tasks, it is necessary to obtain more complete information about the object from the image. In particular, for robotic apple picking, it is necessary to clearly understand where and how much to move the grabber. To determine the real position of the apple relative to the source of image registration, it is proposed to use the Intel Real Sense depth camera and aggregate information from its depth and brightness channels. The apples detection is carried out using the YOLOv3 architecture; then, based on the distance to the object and its localization in the image, the relative distances are calculated for all coordinates. In this case, to determine the coordinates of apples, a transition to a symmetric coordinate system takes place by means of simple linear transformations. Estimating the position in a symmetric coordinate system allows estimating not only the magnitude of the shift but also the location of the object relative to the camera. The proposed approach makes it possible to obtain position estimates with high accuracy. The approximate root mean square error is 7–12 mm, depending on the range and axis. As for precision and recall metrics, the first is 100% and the second is 90%.
A personalized medical approach can make diabetic retinopathy treatment more effective. To select effective methods of treatment, deep analysis and diagnostic data of a patient’s fundus are required. For this purpose, flat optical coherence tomography images are used to restore the three-dimensional structure of the fundus. Heat propagation through this structure is simulated via numerical methods. The article proposes algorithms for smooth segmentation of the retina for 3D model reconstruction and mathematical modeling of laser exposure while considering various parameters. The experiment was based on a two-fold improvement in the number of intervals and the calculation of the root mean square deviation between the modeled temperature values and the corresponding coordinates shown for the convergence of the integro-interpolation method (balance method). By doubling the number of intervals for a specific spatial or temporal coordinate, a decrease in the root mean square deviation takes place between the simulated temperature values by a factor of 1.7–5.9. This modeling allows us to estimate the basic parameters required for the actual practice of diabetic retinopathy treatment while optimizing for efficiency and safety. Mathematical modeling is used to estimate retina heating caused by the spread of heat from the vascular layer, where the temperature rose to 45 °C in 0.2 ms. It was identified that the formation of two coagulates is possible when they are located at least 180 μm from each other. Moreover, the distance can be reduced to 160 μm with a 15 ms delay between imaging.
The article presents the results of a study of the efficiency of various neural networks in the limited conditions of the source data and with a number of simple augmentations. In this case, the dependences were obtained for a serial neural network with back propagation of error. For data augmentation, the simplest transformations were used, including the letters tilting (italics), changing the color of letters (from black to red), as well as distortion of the reference images with white Gaussian noise at a signal-to-noise ratio q from 1 to 10. It is shown that the best results of recognition of letters of the Russian alphabet are provided by a network for which all the augmentation methods discussed in this work were used. A study of the dependence of recognition accuracy on the signal-to-noise ratio in all trained neural networkswas also conducted.
The article is devoted to the study of convolutional neural network inference in the task of image processing under the influence of visual attacks. Attacks of four different types were considered: simple, involving the addition of white Gaussian noise, impulse action on one pixel of an image, and attacks that change brightness values within a rectangular area. MNIST and Kaggle dogs vs. cats datasets were chosen. Recognition characteristics were obtained for the accuracy, depending on the number of images subjected to attacks and the types of attacks used in the training. The study was based on well-known convolutional neural network architectures used in pattern recognition tasks, such as VGG-16 and Inception_v3. The dependencies of the recognition accuracy on the parameters of visual attacks were obtained. Original methods were proposed to prevent visual attacks. Such methods are based on the selection of “incomprehensible” classes for the recognizer, and their subsequent correction based on neural network inference with reduced image sizes. As a result of applying these methods, gains in the accuracy metric by a factor of 1.3 were obtained after iteration by discarding incomprehensible images, and reducing the amount of uncertainty by 4–5% after iteration by applying the integration of the results of image analyses in reduced dimensions.
For diabetic retinopathy treatment, laser coagulation is used in modern practice. During the laser surgery process, the parameters of laser exposure are selected manually by a doctor, which requires the doctor to have sufficient experience and knowledge to achieve a therapeutic effect. On the basis of mathematical modeling of the laser coagulation process, it is possible to estimate the crucial parameters without performing an operation. However, the retina has a rather complex structure, and when even low-cost numerical methods are used for modeling, it takes a long time to obtain a result. In this regard, the development of time-efficient algorithms for three-dimensional modeling is an urgent task, since the use of such algorithms will provide a compre-hensive study within a limited time. In this paper, we study the execution time of algorithms that implement various variations in the application of the splitting method and the finite difference method, adapted to the set problem of heat conduction. The study reveals the most efficient algorithm, which is then vectorized and implemented using the CUDA technology. The study was carried out using Intel Core i7-10875H and Nvidia RTX 2080 MAX Q and showed that an analog of the vector algorithm, focused on solving a multidimensional heat conduction problem, provides an acceleration of no more than 1.5 times compared to the sequential version. The developed vector-based algorithm, focused on the application of the sweep method in all directions of the three-dimensional problem, significantly reduces the time spent on copying into the memory of the video card and provides a 40-fold acceleration in comparison with the sequential three-dimensional modeling algorithm. On the basis of the same approach, a parallel algorithm of mathematical modeling was developed, which provided a 20-fold acceleration at full processor load.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.