Learning 3D global features by aggregating multiple views has been introduced as a successful strategy for 3D shape analysis. In recent deep learning models with end-to-end training, pooling is a widely adopted procedure for view aggregation. However, pooling merely retains the max or mean value over all views, which disregards the content information of almost all views and also the spatial information among the views. To resolve these issues, we propose Sequential Views To Sequential Labels (SeqViews2SeqLabels) as a novel deep learning model with an encoder-decoder structure based on recurrent neural networks (RNNs) with attention. SeqViews2SeqLabels consists of two connected parts, an encoder-RNN followed by a decoder-RNN, that aim to learn the global features by aggregating sequential views and then performing shape classification from the learned global features, respectively. Specifically, the encoder-RNN learns the global features by simultaneously encoding the spatial and content information of sequential views, which captures the semantics of the view sequence. With the proposed prediction of sequential labels, the decoder-RNN performs more accurate classification using the learned global features by predicting sequential labels step by step. Learning to predict sequential labels provides more and finer discriminative information among shape classes to learn, which alleviates the overfitting problem inherent in training using a limited number of 3D shapes. Moreover, we introduce an attention mechanism to further improve the discriminative ability of SeqViews2SeqLabels. This mechanism increases the weight of views that are distinctive to each shape class, and it dramatically reduces the effect of selecting the first view position. Shape classification and retrieval results under three large-scale benchmarks verify that SeqViews2SeqLabels learns more discriminative global features by more effectively aggregating sequential views than state-of-the-art methods.
In this paper we present a novel unsupervised representation learning approach for 3D shapes, which is an important research challenge as it avoids the manual effort required for collecting supervised data. Our method trains an RNN-based neural network architecture to solve multiple view inter-prediction tasks for each shape. Given several nearby views of a shape, we define view inter-prediction as the task of predicting the center view between the input views, and reconstructing the input views in a low-level feature space. The key idea of our approach is to implement the shape representation as a shape-specific global memory that is shared between all local view inter-predictions for each shape. Intuitively, this memory enables the system to aggregate information that is useful to better solve the view inter-prediction tasks for each shape, and to leverage the memory as a view-independent shape representation. Our approach obtains the best results using a combination of L2 and adversarial losses for the view inter-prediction task. We show that VIP-GAN outperforms state-of-the-art methods in unsupervised 3D feature learning on three large scale 3D shape benchmarks.
Jointly learning representations of 3D shapes and text is crucial to support tasks such as cross-modal retrieval or shape captioning. A recent method employs 3D voxels to represent 3D shapes, but this limits the approach to low resolutions due to the computational cost caused by the cubic complexity of 3D voxels. Hence the method suffers from a lack of detailed geometry. To resolve this issue, we propose Y 2 Seq2Seq, a view-based model, to learn cross-modal representations by joint reconstruction and prediction of view and word sequences. Specifically, the network architecture of Y 2 Seq2Seq bridges the semantic meaning embedded in the two modalities by two coupled "Y" like sequence-to-sequence (Seq2Seq) structures. In addition, our novel hierarchical constraints further increase the discriminability of the cross-modal representations by employing more detailed discriminative information. Experimental results on cross-modal retrieval and 3D shape captioning show that Y 2 Seq2Seq outperforms the state-of-the-art methods.
Azimuth multichannel (AMC) synthetic aperture radar (SAR), which contains multiple receiving antennas along the azimuth, can prevent the minimum antenna area constraint and provide high-resolution and wide-swath (HRWS) SAR images. Channel calibration and along-track baseline estimation are important topics in an AMC SAR system, since they have a great impact on image quality. Based on the signal model for stationary target of AMC SAR, this paper first analyses the influence of the along-track baseline and channel imbalances on SAR images by simulation. Then, a novel method to simultaneously estimate the along-track baseline, phase imbalance and range sample time imbalance (RSTI) based on the azimuth cross-correlation in the two-dimensional frequency domain is addressed. In addition, with the help of simulations and real data acquired by Gaofen-3 (GF-3), the effectiveness of this method is verified by comparing with some existing methods. Finally, this paper analyzes the estimation accuracy of this method under different scenarios and signal-to-noise ratios (SNRs), and points out the direction for future research.
The technique of azimuth multichannel synthetic aperture radar (SAR) system has become a potential solution to the irreconcilable conflict between high-resolution and wide-swath (HRWS) confronted with in a traditional SAR system. Unambiguous imaging, especially for a scene with moving targets, is one of the crucial research topics in the HRWS SAR system. This paper proposes a simultaneous imaging scheme of moving targets and stationary clutter for maritime scenarios. First, the moving target echoes are extracted from the stationary clutter. After that, two methods working in completely different principles are used to estimate the radial velocity of each moving target, and the estimated result is used for phase compensation. After that, the moving target echoes are added back to the stationary scene echo and sent to the reconstruction filter. Lastly, the reconstructed echo can be processed by the classical Chirp Scaling (CS) algorithm. Experiments are carried out using the Chinese GaoFen-3 dual-channel data. The estimated velocities of the moving targets are verified by automatic identification service (AIS) information, and the imaging results show that the false targets are effectively suppressed and the moving targets also return to their correct positions along the azimuth.
Azimuth multi-channel Synthetic Aperture Radar (SAR) system operated in burst mode makes high-resolution ultrawide-swath (HRUS) imaging become a reality. This kind of imaging mode has excellent application value for the maritime scenarios requiring wide-area monitoring. This paper suggests a moving target detection (MTD) method of marine scenes based on sparse recovery, which integrates detection, velocity estimation, and relocation. Firstly, the typical phenomenon of scene folding in the coarse-focused domain is introduced in detail. Given that the spatial distribution of moving vessels is highly sparse, the idea of sparse recovery is utilized to acquire the azimuth time characterizing the position of the moving target reasonably. Subsequently, the radial velocity and position information about the targets are obtained simultaneously. What makes the proposed method effective are two characteristics of the moving targets in ocean scenes, high signal-to-clutter ratio (SCR) and sparsity of the spatial distribution. Then, estimation performances under different SCR are analyzed by Monte Carlo experiments. And the actual SCR of the vessels in the ocean scene obtained by GaoFen-3 dual-receive channel mode is invoked as a reference value to verify the effectiveness. Besides, some simulation experiments demonstrate the capability to indicate marine moving targets.
Azimuth multichannel (AMC) synthetic aperture radar (SAR) is an advance technology that can prevent the minimum antenna area constraint and provide high-resolution and wide-swath SAR images. The calibration of phase imbalance is an important topic in AMC SAR signal processing since it has a significant impact on the image quality. For one SAR image, the phase imbalance is usually considered as a constant. However, because of the attitude errors, antenna position errors, target elevation, target motion and phase mismatch of the antenna pattern, the actual phase imbalance is time-varying in azimuth and space-varying in range. Even though the space-time variation of phase imbalances is too tiny to make little effect on the product quality, it is still meaning to study the variation. On the one hand, it is possible to quickly estimate the phase imbalance of the whole observing operation, and eliminate the influences of target motion, etc. on the phase imbalance. On the other hand, some parameters, such as motion, can be retrieved by phase imbalance. This paper first establishes the signal models between phase imbalances and attitude errors, antennal position errors and target elevation. Then, the signal model is verified by some simulations based on the parameters of Gaofen-3 (GF-3). In addition, the phase imbalance of the real data acquired by GF-3 is processed. Finally, based on the procession results of GF-3 real data, this paper makes some discussions and points out the direction for the future work.
Spotlight synthetic aperture radar (SAR) is a proven technique, which can provide high-resolution images as compared to those produced by traditional stripmap SAR. This paper addresses a high-resolution SAR focusing experiment based on Gaofen-3 satellite (GF-3) staring data with about 55 cm azimuth resolution and 240 MHz range bandwidth. In staring spotlight (ST) mode, the antenna always illuminates the same scene on the ground, which can extend the synthetic aperture. Based on a two-step processing algorithm, some special aspects such as curved-orbit model error correction, stop-and-go correction, and antenna pattern demodulation must be considered in image focusing. We provide detailed descriptions of all these aspects and put forward corresponding solutions. Using these suggested methods directly in an imaging module without any modification for other data processing software can make the most of the existing ground data processor. Finally, actual data acquired in GF-3 ST mode is used to validate these methodologies, and a well-focused, high-resolution image is obtained as a result of this focusing experiment.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.