We present a High-Resolution Transformer (HRFormer) that learns high-resolution representations for dense prediction tasks, in contrast to the original Vision Transformer that produces low-resolution representations and has high memory and computational cost. We take advantage of the multi-resolution parallel design introduced in high-resolution convolutional networks (HRNet [45]), along with local-window self-attention that performs self-attention over small non-overlapping image windows [21], for improving the memory and computation efficiency. In addition, we introduce a convolution into the FFN to exchange information across the disconnected image windows. We demonstrate the effectiveness of the High-Resolution Transformer on both human pose estimation and semantic segmentation tasks, e.g., HRFormer outperforms Swin transformer [27] by 1.3 AP on COCO pose estimation with 50% fewer parameters and 30% fewer FLOPs. Code is available at: https://github.com/HRNet/HRFormer.
For emergency rescue and damage assessment after an earthquake, quick detection of seismic landslides in the affected areas is crucial. The purpose of this study is to quickly determine the extent and size of post-earthquake seismic landslides using a small amount of post-earthquake seismic landslide imagery data. This information will serve as a foundation for emergency rescue efforts, disaster estimation, and other actions. In this study, Wenchuan County, Sichuan Province, China’s 2008 post-quake Unmanned Air Vehicle (UAV) remote sensing images are used as the data source. ResNet-50, ResNet-101, and Swin Transformer are used as the backbone networks of Mask R-CNN to train and identify seismic landslides in post-quake UAV images. The training samples are then augmented by data augmentation methods, and transfer learning methods are used to reduce the training time required and enhance the generalization of the model. Finally, transfer learning was used to apply the model to seismic landslide imagery from Haiti after the earthquake that was not calibrated. With Precision and F1 scores of 0.9328 and 0.9025, respectively, the results demonstrate that Swin Transformer performs better as a backbone network than the original Mask R-CNN, YOLOv5, and Faster R-CNN. In Haiti’s post-earthquake images, the improved model performs significantly better than the original model in terms of accuracy and recognition. The model for identifying post-earthquake seismic landslides developed in this paper has good generalizability and transferability as well as good application potential in emergency responses to earthquake disasters, which can offer strong support for post-earthquake emergency rescue and disaster assessment.
As a concise form of user reviews, tips have unique advantages to explain the search results, assist users' decision making, and further improve user experience in vertical search scenarios. Existing work on tip generation does not take query into consideration, which limits the impact of tips in search scenarios. To address this issue, this paper proposes a query-aware tip generation framework, integrating query information into encoding and subsequent decoding processes. Two specific adaptations of Transformer and Recurrent Neural Network (RNN) are proposed. For Transformer, the query impact is incorporated into the self-attention computation of both the encoder and the decoder. As for RNN, the query-aware encoder adopts a selective network to distill query-relevant information from the review, while the query-aware decoder integrates the query information into the attention computation during decoding. The framework consistently outperforms the competing methods on both public and real-world industrial datasets. Last but not least, online deployment experiments on Dianping demonstrate the advantage of the proposed framework for tip generation as well as its online business values.
Analyzing rice growth is essential for examining pests, illnesses, lodging, and yield. To create a Digital Surface Model (DSM ) of three important rice breeding stages, an efficient and fast (compared to manual monitoring) Unoccupied Aerial System was used to collect data. Outliers emerge in DSM as a result of the influence of environ- ment and equipment, and the outliers related to rice not only affect the extraction of rice growth changes but are also more challenging to remove. Therefore, after using ground control points uniform geodetic level for filtering, statistical outlier removal (SOR ) and quadratic surface filtering (QSF ) are used. After that, differential operations are applied to the DSM to create a differential digital surface model that can account for the change in rice plant height. Comparing the prediction accuracy before and after filtering: R2 = 0.72, RMSE = 5.13cm, nRMSE = 10.65% for the initial point cloud; after QSF, R2 = 0.89, RMSE = 2.51cm, nRMSE = 5.21%; after SOR, R2 = 0.92, RMSE = 3.32cm, nRMSE = 6.89%. The findings demonstrate that point cloud filtering, particularly SOR, can increase the accuracy of rice monitoring. The method is effective for monitoring, and after filtering, the accuracy is sufficiently increased to satisfy the needs of growth analysis. This has some potential for application and extension.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.