Multi-Scale Cost Volumes Cascade Network for Stereo Matching

Jia, Xiaogang; Chen, Wei; Li, Chen; Liang, Zhengfa; Wu, Mingxi; Tan, Yusong; Huang, Libo

doi:10.1109/icra48506.2021.9560864

Cited by 8 publications

(5 citation statements)

References 30 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Jia et al. [25] proposed MSCVNet, which combines neural networks and traditional methods to improve the quality of cost volume. Rao et al.…”

Section: Related Workmentioning

confidence: 99%

“…Their network uses multiple disparity distances to construct multi-cost volumes and employs a finer disparity distance to reconstruct the cost volume according to the disparity estimation value of the upper level. Jia et al [25] proposed MSCVNet, which combines neural networks and traditional methods to improve the quality of cost volume. Rao et al [26] developed an effective non-local context attention network to aggregate context information by using semantic information and attention mechanisms for disparity estimation.…”

Section: End-to-end Stereo Matchingmentioning

confidence: 99%

See 1 more Smart Citation

A light‐weight stereo matching network based on multi‐scale features fusion and robust disparity refinement

Yang

Zhao

Feng

et al. 2023

IET Image Processing

View full text Add to dashboard Cite

In recent years, convolutional‐neural‐network based stereo matching methods have achieved significant gains compared to conventional methods in terms of both speed and accuracy. Current state‐of‐the‐art disparity estimation algorithms require many parameters and large amounts of computational resources and are not suited for applications on edge devices. In this paper, an end‐to‐end light‐weight network (LWNet) for fast stereo matching is proposed, which consists of an efficient backbone with multi‐scale feature fusion for feature extraction, a 3D U‐Net aggregation architecture for disparity computation, and color guidance in a 2D convolutional neural network (CNN) for disparity refinement. MobileNetV2 is adopted as an efficient backbone in feature extraction. The channel attention module is applied to improve the representational capacity of features and multi‐resolution information is adaptively incorporated into the cost volume via cross‐scale connections. Further, a left‐right consistency check and color guidance refinement are introduced and a robust disparity refinement network is designed with skip connections and dilated convolutions to capture global context information and improve disparity estimation accuracy with little computational cost and memory space. Extensive experiments on Scene Flow, KITTI 2015, and KITTI 2012 demonstrate that the proposed LWNet achieves competitive accuracy and speed when compared with state‐of‐the‐art stereo matching methods.

show abstract

“…Jia et al. [25] proposed MSCVNet, which combines neural networks and traditional methods to improve the quality of cost volume. Rao et al.…”

Section: Related Workmentioning

confidence: 99%

Section: End-to-end Stereo Matchingmentioning

confidence: 99%

A light‐weight stereo matching network based on multi‐scale features fusion and robust disparity refinement

Yang

Zhao

Feng

et al. 2023

IET Image Processing

View full text Add to dashboard Cite

show abstract

“…Moreovers, the regression or classification by constructing the single scale cost may lead to redundant or insufficient feature information, the model may be overfitting on a certain domain, and the robustness of the algorithm may be affected. As explained in related works [46][47][48] , multi-scale feature information can be utilized to obtain multiple receptive fields. Jeon et al 46 proposed an efficient multi-scale sequential feature fusion network to fully regularize the cost volume.…”

Section: Domain Adaptive Cost Optimizationmentioning

confidence: 99%

An application of stereo matching algorithm based on transfer learning on robots in multiple scenarios

Wang

2023

Preprint

View full text Add to dashboard Cite

Robot vision technology based on binocular vision has significant potential for development in various fields, including 3D scenes and reconstruction, target detection, autonomous driving, and other fields. To date, current binocular vision methods used in robotics engineering suffer from high costs, complex algorithms, and low reliability of the generated disparity map in multiple scenarios. Robots require a cost-effective algorithm with cross-domain generalization capabilities for multiple scenarios. To address these issues, a cross-domain stereo matching algorithm for binocular vision based on transfer learning was proposed in this paper, named Cross-Domain Adaptation and Transfer Learning Network (Ct-Net), which has shown valuable results in multiple robot scenes. First, this paper introduces a General Feature Extractor (GFE) to extract rich general feature information for domain adaptive stereo matching tasks. Then, a feature adapter is used to adapt the general features to the stereo matching network. Furthermore, a Domain Adaptive Cost Optimization Module (DACOM) was designed to optimize the matching cost. A disparity score prediction module was also embedded to adaptively adjust the search range of disparity and optimize the cost distribution. The overall framework was trained using a phased strategy, and ablation experiments were conducted to verify the effectiveness of the training strategy. On KITTI 2015 benchmark, compared with the prototype PSMNet, the 3PE − fg of Ct-Net in all regions and non-occluded regions decreased by 19.3% and 21.1% respectively. On the Middlebury dataset, the 2PE of Ct-Net achieved comparable results on all samples. The quantitative and qualitative results obtained from Middlebury, Apollo, and other datasets demonstrate that Ct-Net significantly improves the cross-domain performance of stereo matching. Stereo matching experiments in real-world scenarios have shown that it can effectively address visual tasks in multiple robot scenes.

show abstract

“…In supervised deep learning, a large amount of labeled data needs to be collected for training [99,100], especially in the scorching field of autonomous driving. In this field, the perception of the environment of unmanned vehicles is particularly important [101,102].…”

Section: Deep Learning-based Autonomous Drivingmentioning

confidence: 99%

Deep Active Learning for Computer Vision Tasks: Methodologies, Applications, and Challenges

Yao

2022

Applied Sciences

Self Cite

View full text Add to dashboard Cite

Active learning is a label-efficient machine learning method that actively selects the most valuable unlabeled samples to annotate. Active learning focuses on achieving the best possible performance while using as few, high-quality sample annotations as possible. Recently, active learning achieved promotion combined with deep learning-based methods, which are named deep active learning methods in this paper. Deep active learning plays a crucial role in computer vision tasks, especially in label-insensitive scenarios, such as hard-to-label tasks (medical images analysis) and time-consuming tasks (autonomous driving). However, deep active learning still has some challenges, such as unstable performance and dirty data, which are future research trends. Compared with other reviews on deep active learning, our work introduced the deep active learning from computer vision-related methodologies and corresponding applications. The expected audience of this vision-friendly survey are researchers who are working in computer vision but willing to utilize deep active learning methods to solve vision problems. Specifically, this review systematically focuses on the details of methods, applications, and challenges in vision tasks, and we also introduce the classic theories, strategies, and scenarios of active learning in brief.

show abstract

Multi-Scale Cost Volumes Cascade Network for Stereo Matching

Cited by 8 publications

References 30 publications

A light‐weight stereo matching network based on multi‐scale features fusion and robust disparity refinement

A light‐weight stereo matching network based on multi‐scale features fusion and robust disparity refinement

An application of stereo matching algorithm based on transfer learning on robots in multiple scenarios

Deep Active Learning for Computer Vision Tasks: Methodologies, Applications, and Challenges

Contact Info

Product

Resources

About