The Binarized Neural Network (BNN) is a Convolutional Neural Network (CNN) consisting of binary weights and activation rather than real-value weights. Smaller models are used, allowing for inference effectively on mobile or embedded devices with limited power and computing capabilities. Nevertheless, binarization results in lower-entropy feature maps and gradient vanishing, which leads to a loss in accuracy compared to real-value networks. Previous research has addressed these issues with various approaches. However, those approaches significantly increase the algorithm’s time and space complexity, which puts a heavy burden on those embedded devices. Therefore, a novel approach for BNN implementation on embedded systems with multi-scale BNN topology is proposed in this paper, from two optimization perspectives: hardware structure and BNN topology, that retains more low-level features throughout the feed-forward process with few operations. Experiments on the CIFAR-10 dataset indicate that the proposed method outperforms a number of current BNN designs in terms of efficiency and accuracy. Additionally, the proposed BNN was implemented on the All Programmable System on Chip (APSoC) with 4.4 W power consumption using the hardware accelerator.