After optimizing the model algorithm, we propose a hardware-aware collaborative training framework based on Federated Learning (FL), which can expand the training dataset for higher accuracy. Besides, it can learn heterogeneous models to meet the latency constraints of multiple edge systems simultaneously. We use our highaccuracy dynamic zeroizing-recovering method to adjust each local model under its latency constraint. A proto-corrected aggregation scheme is further designed to aggregate all heterogeneous local models, satisfying the latency constraint of different systems with one training process and maintaining high accuracy.However, in scenarios that demand extremely low power consumption and high throughput, emerging accelerators are needed to further optimize edge intelligence.IMP architecture is promising for DNN inference. To meet resource constraints and minimize power consumption in IMP devices, we use filter-group pruning and crossbar pruning to reduce crossbar usage without extra hardware units for data aligning. Besides, we adopt the non-ideality adaptation and self-compensation scheme to relieve the impact of non-ideality by exploiting the feature of crossbars without large hardware overhead. Finally, we integrate them into one training process for co-optimization, which improves the accuracy of the final model.In summary, we achieve efficient edge intelligence by optimizing DNN algorithms, training data, and computing devices, encompassing both software and hardware aspects. This unlocks the potential of edge intelligence, ensuring data privacy, achieving high accuracy, and keeping significant throughput across various applications. In the future, we will continue focusing on hardware-software co-design for edge intelligence. First, we intend to develop a dynamic reconfiguration architecture, which can seamlessly switch IMP cells between memory and computing functions, to optimally allocate memory and computing resources for enhancing xiii DNN inference efficiency. Second, we will design IMP accelerators to support various algorithms like Transformer. It will co-optimize algorithms, data, and IMP devices, aiming to comprehensively advance the capabilities and applications of edge intelligence. Third, we will propose a hybrid CNN-Transformers Neural Architecture Search (NAS) framework for the IMP architecture to achieve hardware friendliness, high accuracy, high robustness, low latency, and low power consumption IMP-based edge intelligence.