Cross-modality person re-identification is a challenging task due to the large visual appearance difference between RGB and infrared images. Existing studies mainly focus on learning local features and ignore the correlation between local features. In this paper, the Integration Graph Attention Network is proposed to learn the completed correlation between local features via the graph structure. To this end, the authors learn the coarsefine attention weights to aggregate the local features by considering local detail and global information. Furthermore, the Multi-Centre Constrained Loss is proposed to optimise the feature similarity by constraining the centres of modality and identity. It simultaneously utilises three kinds of centre constraints, that is intra-identity centre constraint, modality centre constraint, and inter-identity centre constraint, in order to reduce the influence of modality information explicitly. The proposed method is evaluated on two standard benchmark datasets, that is SYSU-MM01 and RegDB, and the results demonstrate that the authors' method achieves better performance than the state-of-the-art methods, for example, surpassing NFS by 4.8% and 6.0% mAP on the single-shot setting in All-search and Indoor-search modes, respectively. | INTRODUCTIONPerson re-identification (Re-ID) is an important task, which aims to match the same pedestrian across different cameras [1][2][3]. This task can be applied to many practical application fields such as video surveillance, intelligent traffic supervision etc. With the development of deep learning, person Re-ID methods have made a great progress in recent years, and most of them are designed for processing RGB images captured by visible cameras [4][5][6][7]. However, visible cameras are difficult to capture discriminative appearance information under poor illumination condition. Hence, single modality person Re-ID methods cannot work well in the night scenario.Compared with RGB images, infrared (IR) images could provide more appearance information under poor illumination condition, and therefore cross-modality person Re-ID is naturally proposed to apply RGB and IR images, simultaneously.This is an open access article under the terms of the Creative Commons Attribution-NonCommercial License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited and is not used for commercial purposes.
Recently, part information of pedestrian images has been demonstrated to be effective for person re-identification (ReID), but the part interaction is ignored when using Transformer to learn long-range dependencies. In this paper, we propose a novel transformer network named Completed Part Transformer (CPT) for person ReID, where we design the part transformer layer to learn the completed part interaction. The part transformer layer includes the intra-part layer and the part-global layer, where they consider long-range dependencies from the aspects of the intra-part interaction and the partglobal interaction, simultaneously. Furthermore, in order to overcome the limitation of fixed number of the patch tokens in the transformer layer, we propose the Adaptive Refined Tokens (ART) module to focus on learning the interaction between the informative patch tokens in the pedestrian image, which improves the discrimination of the pedestrian representation. Extensive experimental results on four person ReID datasets, i.e., MSMT17, Market1501, DukeMTMC-reID and CUHK03, demonstrate that the proposed method achieves a new stateof-the-art performance, e.g., it achieves 68.0% mAP and 84.6% Rank-1 accuracy on MSMT17.
Cloud image segmentation plays an important role in ground-based cloud observation. Recently, most existing methods for ground-based cloud image segmentation learn feature representations using the convolutional neural network (CNN), which results in the loss of global information because of the limited receptive field size of the filters in the CNN. In this article, we propose a novel deep model named TransCloudSeg, which makes full use of the advantages of the CNN and transformer to extract detailed information and global contextual information for ground-based cloud image segmentation. Specifically, TransCloudSeg hybridizes the CNN and transformer as the encoders to obtain different features. To recover and fuse the feature maps from the encoders, we design the CNN decoder and the transformer decoder for TransCloudSeg. After obtaining two sets of feature maps from two different decoders, we propose the heterogeneous fusion module to effectively fuse the heterogeneous feature maps by applying the self-attention mechanism. We conduct a series of experiments on Tianjin Normal University large-scale cloud detection database and Tianjin Normal University cloud detection database, and the results show that our method achieves a better performance than other state-of-the-art methods, thus proving the effectiveness of the proposed TransCloudSeg.
Computational Law has begun taking the role in society which has been predicted for some time. Automated decision-making and systems which assist users are now used in various jurisdictions, but with this maturity come certain caveats. Computational Law exists on the platforms which enable it, in this case digital systems, which means that it inherits the same flaws. Cybersecurity is one framework which addresses these potential weaknesses, and in this paper we go through known issues and discuss them in the various levels, from design to the physical realm.We also look at machine-learning specific adversarial problems, which entail further weaknesses. Additionally, we make certain considerations regarding computational law and existing and future legislation. Finally, we present three recommendations which are necessary for computational law to function globally, and which follow ideas in safety and security engineering. As indicated, we find that computational law must seriously consider that not only does it face the same risks as other types of software and computer systems, but that failures within it may cause financial or physical damage, as well as injustice. The consequences of Computational Legal systems failing are in this sense greater than if they were merely software and hardware. And if the system employs machine-learning, it must take note of the very specific dangers which this brings, of which data poisoning is the classic example. Computational law must also be explicitly legislated for, which we show is not the case currently in the EU, and this is also true for the cybersecurity aspects that will be relevant to it. But there is great hope in EU's proposed AI Act, which makes an important attempt at taking the specific problems which Computational Law bring into the legal sphere. Lastly, our recommendations for Computational Law and Cybersecurity are: Accommodation of threats, adequate use, and that humans must remain in the centre of their deployment. The latter is primarily for the abilities humans process and which allow them to handle emergencies.
Recently, convolutional neural network (CNN) dominates the ground-based cloud image segmentation task, but disregards the learning of long-range dependencies due to the limited size of filters. Although Transformer-based methods could overcome this limitation, they only learn long-range dependencies at a single scale, hence failing to capture multi-scale information of cloud image. The multi-scale information is beneficial to ground-based cloud image segmentation, because the features from small scales tend to extract detailed information while features from large scales have the ability to learn global information. In this paper, we propose a novel deep network named Integration Transformer (InTransformer), which builds long-range dependencies from different scales. To this end, we propose the Hybrid Multi-head Transformer Block (HMTB) to learn multi-scale long-range dependencies, and hybridize CNN and HMTB as the encoder at different scales. The proposed InTransformer hybridizes CNN and Transformer as the encoder to extract multi-scale representations, which learns both local information and long-range dependencies with different scales. Meanwhile, in order to fuse the patch tokens with different scales, we propose Mutual Cross-Attention Module (MCAM) for the decoder of InTransformer which could adequately interact multiscale patch tokens in a bidirectional way. We have conducted a series of experiments on large ground-based cloud detection database TLCDD and SWIMSEG. The experimental results show that the performance of our method outperforms other methods, proving the effectiveness of the proposed InTransformer.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.