Traditional visual place recognition (VPR) methods generally use frame-based cameras, which will easily fail due to rapid illumination changes or fast motion. To overcome this, we propose an end-to-end visual place recognition network using event cameras, which can achieve good recognition performance in challenging environments (e.g., large-scale driving scenes). The key idea of the proposed algorithm is firstly to characterize the event streams with the EST voxel grid representation, then extract features using a deep residual network, and finally aggregate features using an improved VLAD network to realize end-to-end visual place recognition using event streams. To verify the effectiveness of the proposed algorithm, on the event-based driving datasets (MVSEC, DDD17, Brisbane-Event-VPR) and the synthetic event datasets (Oxford RobotCar, CARLA), we analyze the performance of our proposed method on large-scale driving sequences including cross-weather, cross-season and illumination changing scenes, and then we compare the proposed method with state-of-the-art event-based VPR method (Ensemble-Event-VPR) to prove its advantages. Experimental results show that the performance of the proposed method is better than that of event-based ensemble scheme in challenging scenarios. To our knowledge, for visual place recognition task, this is the first endto-end weakly supervised deep network architecture that directly processes event stream data.