Qunyan Jiang scite author profile

et al. 2021

IEEE Access

Bridge crack detection is essential to ensure bridge safety. The introduction of deep learning technology has made it possible to detect bridge cracks automatically and accurately. In this study, the Inception-Resnet-v2 algorithm was systematically improved and applied to the real-time detection of bridge cracks. We propose an end-to-end bridge crack detection model based on a convolutional neural network. This model combines the advantages of Inception convolution and residual networks, broadening the network width and alleviating the training problem of the deep network. The calculation speed is improved while still ensuring accuracy. Multi-scale feature fusion enables the network to extract contextual information of different scales, which improves the accuracy of crack recognition. The GKA (K-means clustering method based on a genetic algorithm) realizes the accurate segmentation of the target area, greatly enhances the clustering effect, and effectively improves the detection speed. In this model, large fracture datasets are used for training and testing without pre-training. The experimental results show that the performance of this method was improved in all aspects: accuracy, 99.24%; recall, 99.03%; F-measure, 98.79%; and FPS(Frames Per Second), 196. INDEX TERMS bridge crack detection, Inception-Resnet-v2, multiscale feature fusion, GKAThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.

Bearing Early Fault Diagnosis Based on an Improved Multiscale Permutation Entropy and SVM

Jiang

Dai

et al. 2022

Shock and Vibration

Bearing fault is a process of gradual development and deepening. In the early stage of the fault, if it can be found out in time and taken reasonable prevention and elimination measures, we can avoid serious losses and safety accidents. Therefore, the feature extraction and analysis of early weak fault has important practical significance. In this paper, an improved multiscale permutation entropy (IMPE) method was proposed to overcome the shortcomings in the coarse-grained process. In order to solve the problem that only considering a single coarse-grained sequence under a certain scale may lead to the loss of feature information, this paper proposed to calculate the time series with equal overlapping segments, that was to consider all coarse-grained sequences under the same scale to reflect the feature information of fault signals more comprehensively. In order to solve the problem that feature extraction is not refined enough when using the first-order moment mean value calculation in traditional MPE calculation, a calculation method based on the skewness of the third-order moment was proposed. The calculation method is more sensitive to the complexity and fluctuation of signals and can better describe the feature details and extract the fault features effectively. IMPE was applied to feature extraction of early weak fault of rolling bearing and input into Support Vector Machines (SVMs) for faults classification. Aiming at SVM parameter optimization problem, an improved chaos firefly optimization algorithm was proposed. Experimental results show that the new method of early weak fault identification based on IMPE-SVM was effective in detecting rolling bearing faults with different severity.

A Novel Attention-Based Lightweight Network for Multiscale Object Detection in Underwater Images

et al. 2022

Journal of Sensors

Underwater images have low quality, and underwater targets have different sizes. The mainstream target detection networks cannot achieve good results in detecting objects from underwater images. In this study, a lightweight underwater multiscale target detection model with an attention mechanism is designed to solve the above problems. In this model, MobileNetv3 is used as the backbone network for preliminary feature extraction. The lightweight feature extraction module (LFEM) pays attention to the feature map at the channel and space levels. The features with large weights are promoted, while the features with small weights are suppressed. Meanwhile, cross-group information exchange enriches the semantic information and location information of the objects. The context aggregation module (CIAM) pools the extracted feature maps to obtain feature pyramids, and it uses the upsampling-feature refinement-cascade addition (URC) method to effectively fuse global context information and enhance the feature representation. The scale normalization for feature pyramids (SNFP) performs adaptive multiscale perception and multianchor detection on feature maps to cover objects of different sizes and realize multiscale object detection in underwater images. The proposed network can realize lightweight feature extraction, effectively handle the global relationship between the underwater scene and the object while expanding the receptive field, traverse the objects of different scales, and achieve adaptive multianchor detection of multiscale objects in underwater images. The experimental results indicate that our method achieves an average accuracy of 81.94% and a detection speed of 44.3 FPS on a composite dataset. Also, our method is better than the mainstream object detection networks in terms of detection accuracy, lightweight design, and real-time performance.

Bridge crack detection based on improved single shot multi-box detector

et al. 2022

PLoS ONE

Owing to the development of computerized vision technology, object detection based on convolutional neural networks is being widely used in the field of bridge crack detection. However, these networks have limited utility in bridge crack detection because of low precision and poor real-time performance. In this study, an improved single-shot multi-box detector (SSD) called ISSD is proposed, which seamlessly combines the depth separable deformation convolution module (DSDCM), inception module (IM), and feature recalibration module (FRM) in a tightly coupled manner to tackle the challenges of bridge crack detection. Specifically, DSDCM was utilized for extracting the characteristic information of irregularly shaped bridge cracks. IM was designed to expand the width of the network, reduce network calculations, and improve network computing speed. The FRM was employed to determine the importance of each feature channel through learning, enhance the useful features according to their importance, and suppress the features that are insignificant for bridge crack detection. The experimental results demonstrated that ISSD is effective in bridge crack detection tasks and offers competitive performance compared to state-of-the-art networks.

Multi-Size Object Detection in Large Scene Remote Sensing Images Under Dual Attention Mechanism

et al. 2022

IEEE Access

The remote sensing images in large scenes have a complex background, and the types, sizes, and postures of the targets are different, making object detection in remote sensing images difficult. To solve this problem, an end-to-end multi-size object detection method based on a dual attention mechanism is proposed in this paper. First, the MobileNets backbone network is used to extract multi-layer features of remote sensing images as the input of MFCA, a multi-size feature concentration attention module. MFCA employs an attention mechanism to suppress noise, enhance effective feature reuse, and improve the adaptability of the network to multi-size target features through multi-layer convolution operation. Then, TSDFF (two-stage deep feature fusion module)deeply fuses the feature maps output by MFCA to maximize the correlation between the feature sets and especially improve the feature expression of small targets. Next, the GLCNet (global-local context network) and the SSA (significant simple attention module) distinguish the fused features and screen out useful channel information, which makes the detected features more representative. Finally, the loss function is improved to truly reflect the difference between the candidate frames and the real frames, enhancing the network's ability to predict complex samples. The performance of our proposed method is compared with other advanced algorithms on NWPU VHR-10, DOTA, RSOD open datasets. Experimental results show that our proposed method achieves the best AP (average precision) and mAP (mean average precision), indicating that the method can accurately detect multi-type, multi-size, and multi-posture targets with high adaptability.

DMFFNet: Dual-Mode Multi-Scale Feature Fusion-Based Pedestrian Detection Method

Rui

Ouyang

et al. 2024

IEEE Access

Most contemporary pedestrian detection algorithms are based on visible light image detection. However, in environments with dim light, small targets, and easily occluded and cluttered backgrounds, single-mode visible light images relying on color, texture, and other features cannot adequately represent the feature information of targets; as a result, a large number of targets are lost and the algorithm performance is not good. To address this problem, we propose a dual-modal multi-scale feature fusion network (DMFFNet). First, we use the MobileNet v3 backbone network to extract the features of dual-modal images as input for the multi-scale fusion attention (MFA) module, combining the idea of multi-scale feature fusion and attention mechanism. Second, we deeply fuse the multi-scale features output by the MFA with the double deep feature fusion (DDFF) module to enhance the semantic and geometric information of the target. Finally, we optimize the loss function to reflect the distance between the predicted box and the real box more realistically as well as to enhance the ability of the network toward predicting difficult samples. We performed multi-directional evaluations on the KAIST dual-light pedestrian dataset and the visible-thermal infrared pedestrian dataset (VTI) in our laboratory through comparative and ablation experiments. The overall MR -2 on the KAIST duallight pedestrian dataset is 9.26%, and the MR -2 in dim light, partial occlusion, and severe occlusion are 5.17%, 23.35%, and 47.31%, respectively. The overall MR -2 on the VIT dual-light pedestrian dataset is 9.26%, and the MR -2 in dim light, partial occlusion, and severe occlusion are 5.17%, 23.35%, and 47.31%, respectively. The results show that the algorithm performs well on pedestrian detection, especially in dim light and when the target was occluded.

Context-Based Oriented Object Detector for Small Objects in Remote Sensing Imagery

et al. 2022

Object detection in remote sensing imagery is a challenging task in the field of computer vision and has high research value. To improve the classification accuracy and positioning accuracy of object detection, we propose a new multi-scale oriented object detector suitable for small objects. Firstly, the feature fusion network based on information balance (IBFF) is proposed to reduce the reuse of different layers' features from the backbone network and reduce the interference of redundant information based on the premise that the output features have sufficient information, and retain enough shallow detail information. Secondly, to efficiently utilize deep and shallow features, enhance important features, and reduce background noise interference, different attention-based context feature fusion modules (DACFF) are designed according to the characteristics of different feature fusion stages. Finally, an improved strategy of oriented bounding box regression is proposed to obtain the oriented bounding box with a simpler and more effective strategy. The proposed method was evaluated on two public remote sensing datasets, DOTA and HRSC2016, and their mAP values are 80.96% and 95.01%, respectively, which verified the effectiveness of the proposed algorithm.

Semantic segmentation method of underwater images based on encoder-decoder architecture

et al. 2022

PLoS ONE

With the exploration and development of marine resources, deep learning is more and more widely used in underwater image processing. However, the quality of the original underwater images is so low that traditional semantic segmentation methods obtain poor segmentation results, such as blurred target edges, insufficient segmentation accuracy, and poor regional boundary segmentation effects. To solve these problems, this paper proposes a semantic segmentation method for underwater images. Firstly, the image enhancement based on multi-spatial transformation is performed to improve the quality of the original images, which is not common in other advanced semantic segmentation methods. Then, the densely connected hybrid atrous convolution effectively expands the receptive field and slows down the speed of resolution reduction. Next, the cascaded atrous convolutional spatial pyramid pooling module integrates boundary features of different scales to enrich target details. Finally, the context information aggregation decoder fuses the features of the shallow network and the deep network to extract rich contextual information, which greatly reduces information loss. The proposed method was evaluated on RUIE, HabCam UID, and UIEBD. Compared with the state-of-the-art semantic segmentation algorithms, the proposed method has advantages in segmentation integrity, location accuracy, boundary clarity, and detail in subjective perception. On the objective data, the proposed method achieves the highest MIOU of 68.3 and OA of 79.4, and it has a low resource consumption. Besides, the ablation experiment also verifies the effectiveness of our method.