Heterogeneous high-resolution remote sensing image matching will be disturbed by the differences in sensor type, imaging angle, height, and imaging time, and the matching difficulty is further increased in complex scenes with dense urban buildings and noticeable height differences. This paper proposes a method for matching heterogeneous high-resolution remote sensing images based on partitioned feature extraction and threedimensional spatial constraints. First, this paper conducts image partitioning based on the geometric differences of ground objects. Two feature extraction methods, namely, adaptive phase threshold and weighted moment map, are employed to extract feature points independently. To address the issue of inaccurate feature descriptions caused by drastic changes in viewing angles in buildings, we construct a robust feature descriptor by combining a multi-scale phase weighted energy convolution histogram (MSPW-ECH) with a new gradient location orientation histogram (GLOH)-like local feature descriptor. Additionally, a new similarity measure incorporating three-dimensional spatial constraints and the Marginalizing Sample Consensus (MAGSAC) method is applied to eliminate mismatched point pairs, ensuring the acquisition of precise matching points. Based on the feature detection results of two different synthetic data sets, it is evident that the proposed detector outperforms the three classical detectors in terms of repeatability and uniformity. Ultimately, the matching performance is experimentally verified on six groups of heterogeneous high-resolution remote sensing images. The experimental results show that the proposed method significantly outperforms RIFT, HAPCG, and MS-HLMO methods and achieves the best matching accuracy results.