Robot vision technology based on binocular vision has significant potential for development in various fields, including 3D scenes and reconstruction, target detection, autonomous driving, and other fields. To date, current binocular vision methods used in robotics engineering suffer from high costs, complex algorithms, and low reliability of the generated disparity map in multiple scenarios. Robots require a cost-effective algorithm with cross-domain generalization capabilities for multiple scenarios. To address these issues, a cross-domain stereo matching algorithm for binocular vision based on transfer learning was proposed in this paper, named Cross-Domain Adaptation and Transfer Learning Network (Ct-Net), which has shown valuable results in multiple robot scenes. First, this paper introduces a General Feature Extractor (GFE) to extract rich general feature information for domain adaptive stereo matching tasks. Then, a feature adapter is used to adapt the general features to the stereo matching network. Furthermore, a Domain Adaptive Cost Optimization Module (DACOM) was designed to optimize the matching cost. A disparity score prediction module was also embedded to adaptively adjust the search range of disparity and optimize the cost distribution. The overall framework was trained using a phased strategy, and ablation experiments were conducted to verify the effectiveness of the training strategy. On KITTI 2015 benchmark, compared with the prototype PSMNet, the 3PE − fg of Ct-Net in all regions and non-occluded regions decreased by 19.3% and 21.1% respectively. On the Middlebury dataset, the 2PE of Ct-Net achieved comparable results on all samples. The quantitative and qualitative results obtained from Middlebury, Apollo, and other datasets demonstrate that Ct-Net significantly improves the cross-domain performance of stereo matching. Stereo matching experiments in real-world scenarios have shown that it can effectively address visual tasks in multiple robot scenes.