Convolutional neural networks (CNNs) are the state-of-the-art method for most computer vision tasks. However, their excessive computational requirements make the deployment of CNNs on mobile or embedded platforms challenging. In this paper, we present and end-to-end neural network solution to scene understanding in the context of robot soccer. Our system uses two key neural networks: one to perform semantic segmentation on an image, and another to propagate class labels between consecutive frames. Our networks are trained on synthetic datasets and fine-tuned on a set consisting of real images taken using a Nao robot. Furthermore, we discuss and evaluate several practical methods for increasing the efficiency and performance of our networks. Finally, we present NaoDNN, a C++ neural network library designed for fast inference on the Nao robots.