Over the last years, Convolutional Neural Networks (CNNs) have been widely used in remote sensing applications, such as marine surveillance, traffic management or road networks detection. However, since CNNs have extremely high computational, bandwith and memory requirements, the hardware implementation of a CNN on space-grade devices like FPGAs for the on-board processing of the acquired images has brought many challenges, since the computational capabilities of the onboard hardware devices are limited. Hence, implementations have to be carefully planned. In this paper, the authors present their work towards the implementation of an efficient CNN onto a space-grade FPGA in order to achieve the on-board processing of very-high resolution remotely sensed images as soon as the data are provided by the sensor. All this work has been conducted within the EU-funded VIDEO project. As it will be presented in this paper, the work includes the introduction of a methodology based on the project constraints, the evaluation of different state-of-the-art CNN architectures by means of a new efficiency measurement also proposed in this work, the introduction of a new efficient CNN architecture, and finally, its optimized hardware implementation by means of high-level synthesis tools. The results obtained following the proposed methodology demonstrate that the uncovered architecture is able to detect targets of interest in RGB images with a much higher efficiency than state-of-the-art solutions, while requiring a much smaller amount of computing and memory resources.