Real-time semantic segmentation has been challenging, and the fusion of features from different branches remains crucial to improvement. The two-branch structure has shown promising results in real-time semantic segmentation. However, upsampling feature maps from the semantic branch to match the detail branch leads to a loss of object feature information and compromises segmentation accuracy. We propose a deep bilateral fusion and bilateral embedded network (BFBE-Net) based on the encoder-decoder structure for real-time semantic segmentation to address these issues. The BFBE-Net adopts a two-branch design in the encoder, with a top-down fusion module and a bottom-up fusion module designed to integrate multi-scale context information in the channel dimension, and assigns different weights to detailed information and semantic information to enhance information characteristics. In the decoder, a bilateral embedded attention module under the guidance of spatial and channel attention integrates semantic and spatial features, gradually upsampling feature maps to reduce the loss of feature information. In addition, an enhanced aggregation pyramid pooling module is designed to efficiently extract contextual information by combining depth-wise asymmetric convolution. The proposed algorithm is evaluated on two benchmark datasets, Cityscapes and CamVid, achieving 78.5% mean intersection over union (mIoU) at 82 frames per second (FPS) on the Cityscapes test set and 79.2% mIoU at 131 FPS on the CamVid test set. The proposed BFBE-Net not only improves segmentation accuracy but also ensures real-time performance.