Pansharpening is crucial for obtaining highresolution multispectral images. Existing deep learning-based pansharpening networks rely on supervised learning with external reference labels. Due to the lack of actual fusion results for labeling, simulated degraded data is used with the original multispectral image as the fusion result label. This process steps are cumbersome, which also leads to the problem of scale degradation, and the fusion relationship between data before and after the degradation cannot represent the real fusion relationship. To address these limitations, we propose a selfsupervised interactive dual-stream network for pansharpening (SIDP) using real training datasets. Our approach incorporates a dual-stream network architecture, comprising a spatial scale enhancement stream and a spectral channel attention stream. Spatial and spectral features essential for fusion are extracted from the original panchromatic and multispectral images, respectively. Through interconnection at different levels, the network expands the search range in the feature space, enabling continuous interaction between spatial and spectral information during feature extraction and transmission. This ensures the injection of spatial features of varying scales into correspondingscale spectral features, enhancing complementarity between features. Moreover, we introduce a novel joint spatial-spectral loss function, leveraging the original panchromatic and multispectral images themselves as self-supervised labels. Experimental results on diverse satellite datasets demonstrate the outstanding fusion performance of our method, as assessed through both subjective qualitative evaluation and objective quantitative evaluation. Furthermore, our proposed method exhibits exceptional generalization performance for full-scale remote sensing images, showcasing its practical value.