Deep-Learning-based (DL-based) approaches have achieved remarkable performance in hyperspectral image (HSI) change detection (CD). Convolutional Neural Networks (CNNs) are often employed to capture fine spatial features, but they do not effectively exploit the spectral sequence information. Furthermore, existing Siamese-based networks ignore the interaction of change information during feature extraction. To address this issue, we propose a novel architecture, the Spectral–Temporal Transformer (STT), which processes the HSI CD task from a completely sequential perspective. The STT concatenates feature embeddings in spectral order, establishing a global spectrum–time-receptive field that can learn different representative features between two bands regardless of spectral or temporal distance, thereby strengthening the learning of temporal change information. Via the multi-head self-attention mechanism, the STT is capable of capturing spectral–temporal features that are weighted and enriched with discriminative sequence information, such as inter-spectral correlations, variations, and time dependency. We conducted experiments on three HSI datasets, demonstrating the competitive performance of our proposed method. Specifically, the overall accuracy of the STT outperforms the second-best method by 0.08%, 0.68%, and 0.99% on the Farmland, Hermiston, and River datasets, respectively.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.