Real-time semantic segmentation is in intense demand for the application of autonomous driving. Most of the semantic segmentation models tend to use large feature maps and complex structures to enhance the representation power for high accuracy. However, these inefficient designs increase the amount of computational costs, which hinders the model to be applied on autonomous driving. In this paper, we propose a lightweight realtime segmentation model, named Parallel Complement Network (PCNet), to address the challenging task with fewer parameters. A Parallel Complement layer is introduced to generate complementary features with a large receptive field. It provides the ability to overcome the problem of similar feature encoding among different classes, and further produces discriminative representations. With the inverted residual structure, we design a Parallel Complement block to construct the proposed PCNet. Extensive experiments are carried out on challenging road scene datasets, i.e., CityScapes and CamVid, to make comparison against several state-of-the-art real-time segmentation models. The results show that our model has promising performance. Specifically, PCNet* achieves 72.9% Mean IoU on CityScapes using only 1.5M parameters and reaches 79.1 FPS with 1024×2048 resolution images on GTX 2080Ti. Moreover, our proposed system achieves the best accuracy when being trained from scratch.
Stereo matching for depth estimation is a fundamental vision problem. Recent works focus on deep learning to improve accuracy, but most networks encountered the difficulty of poor generalization ability and high computational cost especially on high resolution images. RAFTStereo achieves great advantages in these two aspects, but still can be improved further. In this paper, we revise the residual block of RAFT-Stereo in its feature extractors to improve the performance in underwater scenarios. Specifically, we choose an iterative Attentional Feature Fusion module to utilize the global information in feature fusion. To justify our work, we test our networks on ETH3D benchmark and our own underwater dataset, which demonstrates the superiority of our model as compared to the state-of-the-art baselines. Eventually, comparing to original RAFT-Stereo, our results on ETH3D benchmark outperform by 13.1% on the default metric bad 1-pixel error (percentage of pixels with end-point-errors greater than 1px) and results on our underwater dataset reduce the average error by 16.9%.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.