Deep learning algorithms have been increasingly used in ship image detection and classification. To improve the ship detection and classification in photoelectric images, an improved recurrent attention convolutional neural network is proposed. The proposed network has a multi-scale architecture and consists of three cascading sub-networks, each with a VGG19 network for image feature extraction and an attention proposal network for locating feature area. A scale-dependent pooling algorithm is designed to select an appropriate convolution in the VGG19 network for classification, and a multi-feature mechanism is introduced in attention proposal network to describe the feature regions. The VGG19 and attention proposal network are cross-trained to accelerate convergence and to improve detection accuracy. The proposed method is trained and validated on a self-built ship database and effectively improve the detection accuracy to 86.7% outperforming the baseline VGG19 and recurrent attention convolutional neural network methods.
With the rapid development of the marine industry, intelligent ship detection plays a very important role in the marine traffic safety and the port management. Current detection methods mainly focus on synthetic aperture radar (SAR) images, which is of great significance to the field of ship detection. However, these methods sometimes cannot meet the real-time requirement. To solve the problems, a novel ship detection network based on SSD (Single Shot Detector), named NSD-SSD, is proposed in this paper. Nowadays, the surveillance system is widely used in the indoor and outdoor environment, and its combination with deep learning greatly promotes the development of intelligent object detection and recognition. The NSD-SSD uses visual images captured by surveillance cameras to achieve real-time detection and further improves detection performance. First, dilated convolution and multiscale feature fusion are combined to improve the small objects’ performance and detection accuracy. Second, an improved prediction module is introduced to enhance deeper feature extraction ability of the model, and the mean Average Precision (mAP) and recall are significant improved. Finally, the prior boxes are reconstructed by using the K-means clustering algorithm, the Intersection-over-Union (IoU) is higher, and the visual effect is better. The experimental results based on ship images show that the mAP and recall can reach 89.3% and 93.6%, respectively, which outperforms the representative model (Faster R-CNN, SSD, and YOLOv3). Moreover, our model’s FPS is 45, which can meet real-time detection acquirement well. Hence, the proposed method has the better overall performance and achieves higher detection efficiency and better robustness.
As an essential biological feature of human beings, voiceprint is increasingly used in medical research and diagnosis, especially in identifying Parkinson's Disease (PD). This paper proposes a Spectrogram Deep Convolutional Generative Adversarial Network (S-DCGAN) for sample augmentation to overcome the limited amount of existing patient voiceprint datasets and samples. S-DCGAN generates a high-resolution spectrogram by increasing network layers, adding the Spectral Normalization (SN) method, and combining feature matching strategy. The high-similarity and low-distortion spectrogram are selected in light of Structural Similarity Index (SSIM) values and Peak Signal to Noise Ratio (PSNR) to augment the samples. Fréchet Inception Distance (FID) and GAN-train result show the generalization ability of the generated data. We construct the ResNet50 model with a Global Average Pooling(GAP) layer to extract the voiceprint features and classify them effectively to improve recognition accuracy. The GAP suppresses the over-fitting problem and optimizes quickly. Finally, on the Sakar dataset, comparative experiments were conducted on different models and classification methods. Results show that the S-DCGAN-ResNet50 hybrid model can achieve the highest voiceprint recognition accuracy of 91.25% and specificity of 92.5%, which can distinguish between PD patients and healthy people more precisely compared with DCGAN-ResNet50. It augments the application environment of voiceprint recognition in the medical field and makes it universal in different datasets.
In recent years, the maritime industry is developing rapidly, which poses great challenges for intelligent ship navigation systems to achieve accurate ship classification. To cope with this problem, a Recurrent Attention Convolutional Neural Network (RA‐CNN) is proposed, which is fused with multiple feature regions for ship classification. The proposed model has three scale layers, each of which contains a classification network VGG‐19 and a localisation head Attention Proposal Network (APN). First, the Scale Dependent Pooling algorithm is integrated with VGG‐19 to reduce the impact of over‐pooling and improve the classification performance of small ships. Second, the APN incorporates the Joint Clustering algorithm to generate multiple independent feature regions; thus, the whole model can make full use of the global information in ship recognition. In the meantime, the Feature Regions Optimisation method is designed to solve the overfitting problem and reduce the overlap rate of multiple feature regions. Finally, a novel loss function is defined to cross‐train VGG‐19 and APN, which accelerates the convergence process. The experimental results show that the classification accuracy of the authors’ proposed method reaches 90.2%, which has a 6% improvement over the baseline RA‐CNN. Both classification accuracy and robustness are improved by a large margin compared to those of other compared models.
Humans express their emotions in a variety of ways, which inspires research on multimodal fusion-based emotion recognition that utilizes different modalities to achieve information complementation. However, extracting deep emotional features from different modalities and fusing them remain a challenging task. It is essential to exploit the advantages of different extraction and fusion approaches to capture the emotional information contained within and across modalities. In this paper, we present a novel multimodal emotion recognition framework called multimodal emotion recognition based on cascaded multichannel and hierarchical fusion (CMC-HF), where visual, speech, and text signals are simultaneously utilized as multimodal inputs. First, three cascaded channels based on deep learning technology perform feature extraction for the three modalities separately to enhance deeper information extraction ability within each modality and improve recognition performance. Second, an improved hierarchical fusion module is introduced to promote intermodality interactions of three modalities and further improve recognition and classification accuracy. Finally, to validate the effectiveness of the designed CMC-HF model, some experiments are conducted to evaluate two benchmark datasets, IEMOCAP and CMU-MOSI. The results show that we achieved an almost 2%∼3.2% increase in accuracy of the four classes for the IEMOCAP dataset as well as an improvement of 0.9%∼2.5% in the average class accuracy for the CMU-MOSI dataset when compared to the existing state-of-the-art methods. The ablation experimental results indicate that the cascaded feature extraction method and the hierarchical fusion method make a significant contribution to multimodal emotion recognition, suggesting that the three modalities contain deeper information interactions of both intermodality and intramodality. Hence, the proposed model has better overall performance and achieves higher recognition efficiency and better robustness.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.