It is a common belief that training deep transformers from scratch requires large datasets. Consequently, for small datasets, people usually use shallow and simple additional layers on top of pre-trained models during finetuning. This work shows that this does not always need to be the case: with proper initialization and optimization, the benefits of very deep transformers can carry over to challenging tasks with small datasets, including Textto-SQL semantic parsing and logical reading comprehension. In particular, we successfully train 48 layers of transformers, comprising 24 fine-tuned layers from pre-trained RoBERTa and 24 relation-aware layers trained from scratch. With fewer training steps and no task-specific pre-training, we obtain the state-of-the-art performance on the challenging cross-domain Text-to-SQL parsing benchmark Spider 1 . We achieve this by deriving a novel Data-dependent Transformer Fixedupdate initialization scheme (DT-Fixup), inspired by the prior T-Fixup work (Huang et al., 2020). Further error analysis shows that increasing depth can help improve generalization on small datasets for hard cases that require reasoning and structural understanding.
Semantic segmentation of remote sensing images is always a critical and challenging task. Graph neural networks, which can capture global contextual representations, can exploit long-range pixel dependency, thereby improving semantic segmentation performance. In this paper, a novel self-constructing graph attention neural network is proposed for such a purpose. Firstly, ResNet50 was employed as backbone of a feature extraction network to acquire feature maps of remote sensing images. Secondly, pixel-wise dependency graphs were constructed from the feature maps of images, and a graph attention network is designed to extract the correlations of pixels of the remote sensing images. Thirdly, the channel linear attention mechanism obtained the channel dependency of images, further improving the prediction of semantic segmentation. Lastly, we conducted comprehensive experiments and found that the proposed model consistently outperformed state-of-the-art methods on two widely used remote sensing image datasets.
Semantic segmentation of remote sensing (RS) images, which is a fundamental research topic, classifies each pixel in an image. It plays an essential role in many downstream RS areas, such as land-cover mapping, road extraction, traffic monitoring, and so on. Recently, although deep-learning-based methods have shown their dominance in automatic semantic segmentation of RS imagery, the performance of these existing methods has relied heavily on large amounts of high-quality training data, which are usually hard to obtain in practice. Moreover, human-in-the-loop semantic segmentation of RS imagery cannot be completely replaced by automatic segmentation models, since automatic models are prone to error in some complex scenarios. To address these issues, in this paper, we propose an improved, smart, and interactive segmentation model, DRE-Net, for RS images. The proposed model facilitates humans’ performance of segmentation by simply clicking a mouse. Firstly, a dynamic radius-encoding (DRE) algorithm is designed to distinguish the purpose of each click, such as a click for the selection of a segmentation outline or for fine-tuning. Secondly, we propose an incremental training strategy to cause the proposed model not only to converge quickly, but also to obtain refined segmentation results. Finally, we conducted comprehensive experiments on the Potsdam and Vaihingen datasets and achieved 9.75% and 7.03% improvements in NoC95 compared to the state-of-the-art results, respectively. In addition, our DRE-Net can improve the convergence and generalization of a network with a fast inference speed.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.