Abstract:Optical remote sensing images are widely used in the fields of feature recognition, scene semantic segmentation, and others. However, the quality of remote sensing images is degraded due to the influence of various noises, which seriously affects the practical use of remote sensing images. As remote sensing images have more complex texture features than ordinary images, this will lead to the previous denoising algorithm failing to achieve the desired result. Therefore, we propose a novel remote sensing image d… Show more
“…e entire training dataset is optimized using a stochastic gradient descent algorithm, which allows the LSTM to learn more appropriate implicit states. e output of the second layer LSTM z is speci ed by nding the most probable target word y in the vocabulary Y as shown in the following equation, where W y indicates the weight of the output: processing signals is referred to as visual AM [24]. e area of the target on which human vision can gain focus by quickly capturing the image, in order to obtain more detailed information about the target to be focused on and to eliminate other useless information is referred to as the focus of attention [25].…”
The automatic description (AD) of sports videos is a fundamental task for archiving the content of broadcasters, as well as understanding video scenes, and economic management effectiveness visualization techniques are key to the classification of sports videos. In this paper, a freestyle gymnastics video is used as an example to study the automatic video description by observing the set of movements of an athlete in a freestyle gymnastics video to generate the terminology of the movements performed by that athlete. The technique used in this paper to visualize the effectiveness of economic management is the long and short-term memory (LSTM) network model, which is used to learn the mapping relationship between word sequences and video frame sequences. Attention mechanisms (AM) are also introduced to highlight the importance of keyframes that determine freestyle gymnastics movements. The study is carried out by building a dataset of free gymnastics (FG) breakdown movements from professional events and applying a planned sampling method. Experimental results show that the method can improve the accuracy of an automatic free gymnastics video (FGV) description. The proposed method has a wide range of applications in sports analysis and instruction.
“…e entire training dataset is optimized using a stochastic gradient descent algorithm, which allows the LSTM to learn more appropriate implicit states. e output of the second layer LSTM z is speci ed by nding the most probable target word y in the vocabulary Y as shown in the following equation, where W y indicates the weight of the output: processing signals is referred to as visual AM [24]. e area of the target on which human vision can gain focus by quickly capturing the image, in order to obtain more detailed information about the target to be focused on and to eliminate other useless information is referred to as the focus of attention [25].…”
The automatic description (AD) of sports videos is a fundamental task for archiving the content of broadcasters, as well as understanding video scenes, and economic management effectiveness visualization techniques are key to the classification of sports videos. In this paper, a freestyle gymnastics video is used as an example to study the automatic video description by observing the set of movements of an athlete in a freestyle gymnastics video to generate the terminology of the movements performed by that athlete. The technique used in this paper to visualize the effectiveness of economic management is the long and short-term memory (LSTM) network model, which is used to learn the mapping relationship between word sequences and video frame sequences. Attention mechanisms (AM) are also introduced to highlight the importance of keyframes that determine freestyle gymnastics movements. The study is carried out by building a dataset of free gymnastics (FG) breakdown movements from professional events and applying a planned sampling method. Experimental results show that the method can improve the accuracy of an automatic free gymnastics video (FGV) description. The proposed method has a wide range of applications in sports analysis and instruction.
“…DDPM has shown its superiority in synthesizing and recovering high-quality images [59]. In remote sensing image analysis, diffusion models have proven effective, especially in enhancing image representation and detail supplementation [60]- [62]. Furthermore, the DM also demonstrates its utility in cloud removal [63]- [65] and image segmentation [66] tasks.…”
In recent years, the application of deep learning to change detection (CD) has significantly progressed in remote sensing images. CD tasks have mostly used architectures such as CNN and Transformer to locate image changes. However, these architectures have shortcomings in representing boundary details and are prone to false alarms and missed detections under complex lighting and weather conditions. For that, we propose a new network, Siamese Meets Diffusion Network (SMDNet), a CD model that combines discriminative and generative architecture. By leveraging the power of the Siam-U2Net Feature Differential Encoder (SU-FDE) and Denoising Diffusion Implicit Model (DDIM), it not only improves the accuracy of object edge detection but also enhances the data through iterative denoising and thinning reconstruction Detail detection accuracy. Improves the model's robustness under environmental changes. First, we propose an SU-FDE module that uses shared weight features to capture differences between time series images, refine edge detection, and combine it with the attention mechanism to identify vital coarse features, thereby improving model sensitivity and accuracy. Finally, the progressive sampling of DDIM is used to integrate further these key features, and the adaptability of the model in different environments is enhanced with the help of the denoising ability of the diffusion model and the accurate capture of the probability distribution of image data. The performance evaluation of SMDNet on LEVIR-CD, DSIFN-CD, and CDD datasets yields validated F1 scores of 89.17%, 88.48%, and 88.23%, respectively. This substantiates the advanced capabilities of our model in accurately identifying variations and intricate details.
“…The Coordinate Attention module can improve accuracy without increasing the number of parameters. Han et al [33] have constructed a remote sensing image denoising network based on a deep learning approach, which enhances the ECA-Net by using multiple local jump connections to improve the denoising ability of the model. Kim et al [34] have reduced the computational effort required to detect small targets and improved the detection rate by using the channel attention pyramid method.…”
Convolutional neural networks have recently experienced successful development in the field of computer vision. In precision agriculture, apple picking robots use computer vision methods to detect apples in orchards. However, existing object detection algorithms often face problems such as leaf shading, complex illumination environments, and small, dense recognition targets, resulting in low apple detection rates and inaccurate localization. In view of these problems, we designed an apple detection model based on lightweight YOLOv4—called Improved YOLOv4—from the perspective of industrial application. First, to improve the detection accuracy while reducing the amount of computation, the GhostNet feature extraction network with a Coordinate Attention module is implemented in YOLOv4, and depth-wise separable convolution is introduced to reconstruct the neck and YOLO head structures. Then, a Coordinate Attention module is added to the feature pyramid network (FPN) structure in order to enhance the feature extraction ability for medium and small targets. In the last 15% of epochs in training, the mosaic data augmentation strategy is turned off in order to further improve the detection performance. Finally, a long-range target screening strategy is proposed for standardized dense planting apple orchards with dwarf rootstock, removing apples in non-target rows and improving detection performance and recognition speed. On the constructed apple data set, compared with YOLOv4, the mAP of Improved YOLOv4 was increased by 3.45% (to 95.72%). The weight size of Improved YOLOv4 is only 37.9 MB, 15.53% of that of YOLOv4, and the detection speed is improved by 5.7 FPS. Two detection methods of similar size—YOLOX-s and EfficientNetB0-YOLOv3—were compared with Improved YOLOv4. Improved YOLOv4 outperformed these two algorithms by 1.82% and 2.33% mAP, respectively, on the total test set and performed optimally under all illumination conditions. The presented results indicate that Improved YOLOv4 has excellent detection accuracy and good robustness, and the proposed long-range target screening strategy has an important reference value for solving the problem of accurate and rapid identification of various fruits in standard orchards.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.