Data-driven soft sensing techniques have been widely applied to predict the quality variables in the process industry. The temporal and spatial structure information is important for process modeling because it helps extract the nonlinear and dynamic characteristics from data. Deep learning can be used to mimic the physical linking structure by designing similar node connections in the neural network. Taking a complex process with countercurrent unit connection as a demonstration, this paper proposes using a bidirectional long short-term memory-based network with spatiotemporal feature extraction for soft-sensor development. An attention mechanism is further conducted for the feature learning from space and time simultaneously. The effectiveness of the proposed method is validated through a simulation example and a real industrial example.