<p>Most real-time computer vision applications, such as pedestrian detection, augmented reality, and virtual reality, heavily rely on convolutional neural networks (CNN) for real-time decision support. In addition, edge intelligence is becoming necessary for low-latency real-time applications to process the data at the source device. Therefore, processing massive amounts of data impact memory footprint, prediction time, and energy consumption, essential performance metrics in machine learning based internet of things (IoT) edge clusters. However, deploying deeper, dense, and hefty weighted CNN models on resource-constraint embedded systems and limited edge computing resources, such as memory, and battery constraints, poses significant challenges in developing the compact optimized model. Reducing the energy consumption in edge IoT networks is possible by reducing the computation and data transmission between IoT devices and gateway devices. Hence there is a high demand for making energy-efficient deep learning models for deploying on edge devices. Furthermore, recent studies show that smaller compressed models achieve significant performance compared to larger deep-learning models. This review article focuses on state-of-the-art techniques of edge intelligence, and we propose a new research framework for designing a compact optimized deep learning (DL) model deployment on edge devices.</p>