Recent studies have demonstrated that deep learning‐based stereo matching methods (DLSMs) can far exceed conventional ones on most benchmark datasets by both improving visual performance and decreasing the mismatching rate. However, applying DLSMs on high‐resolution satellite stereos with broad image coverage and wide terrain variety is still challenging. First, the broad coverage of satellite stereos brings a wide disparity range, while DLSMs are limited to a narrow disparity range in most cases, resulting in incorrect disparity estimation in areas with contradictory disparity ranges. Second, high‐resolution satellite stereos always comprise various terrain types, which is more complicated than carefully prepared datasets. Thus, the performance of DLSMs on satellite stereos is unstable, especially for intractable regions such as texture‐less and occluded regions. Third, generating DSMs requires occlusion‐aware disparity maps, while traditional occlusion detection methods are not always applicable for DLSMs with continuous disparity. To tackle these problems, this paper proposes a novel DLSM‐based DSM generation workflow. The workflow comprises three steps: pre‐processing, disparity estimation and post‐processing. The pre‐processing step introduces low‐resolution terrain to shift unmatched disparity ranges into a fixed scope and crops satellite stereos to regular patches. The disparity estimation step proposes a hybrid feature fusion network (HF2Net) to improve the matching performance. In detail, HF2Net designs a cross‐scale feature extractor (CSF) and a multi‐scale cost filter. The feature extractor differentiates structural‐context features in complex scenes and thus enhances HF2Net's robustness to satellite stereos, especially on intractable regions. The cost filter filters out most matching errors to ensure accurate disparity estimation. The post‐processing step generates initial DSM patches with estimated disparity maps and then refines them for the final large‐scale DSMs. Primary experiments on the public US3D dataset showed better accuracy than state‐of‐the‐art methods, indicating HF2Net's superiority. We then created a self‐made Gaofen‐7 dataset to train HF2Net and conducted DSM generation experiments on two Gaofen‐7 stereos to further demonstrate the effectiveness and practical capability of the proposed workflow.