In the optimization of Artificial Neural Networks (ANNs) via Evolutionary Algorithms (EAs) and the implementation of the necessary training for the objective function, there is often a trade-off between efficiency and flexibility. Pure software solutions on general-purpose processors tend to be slow because they do not take advantage of the inherent parallelism, whereas hardware realizations usually rely on optimizations that reduce the range of applicable network topologies, or they attempt to increase processing efficiency by means of low-precision data representation. This paper presents, first of all, a study that shows the need of heterogeneous platform (CPU-GPU-FPGA) to accelerate the optimization of ANNs using genetic algorithms and, secondly, an implementation of a platform based on embedded systems with hardware accelerators implemented in FPGA (Field Pro-grammable Gate Array). The implementation of the individuals on a remote low-cost Altera FPGA allowed us to obtain a 3x-4x acceleration compared with a 2.83 GHz Intel Xeon Quad-Core and 6x-7x compared with a 2.2GHz AMD Opteron Quad-Core 2354.