Spiking neural networks (SNNs) are brain-inspired event-driven models of computation with promising ultra-low energy dissipation. Rich network dynamics emergent in recurrent spiking neural networks (R-SNNs) can form temporally-based memory, offering great potential in processing complex spatiotemporal data. However, recurrence in network connectivity produces tightly coupled data dependency in both space and time, rendering hardware acceleration of R-SNNs challenging. We present the first work to exploit spatiotemporal parallelisms to accelerate the R-SNN based inference on systolic arrays using an architecture called SaARSP. We decouple the processing of feedforward synaptic connections from that of recurrent connections to allow for the exploitation of parallelisms across multiple time points. We propose a novel time window size optimization (TWSO) technique, to further explore the temporal granularity of the proposed decoupling in terms of optimal time window size and reconfiguration of the systolic array considering layer-dependent connectivity to boost performance. Stationary dataflow and time window size are jointly optimized to trade-off between weight data reuse and movements of partial sums, the two bottlenecks in latency and energy dissipation of the accelerator. The proposed systolic-array architecture offers a unifying solution to an acceleration of both feedforward and recurrent SNNs, and delivers 4,000X EDP improvement on average for different R-SNN benchmarks over a conventional baseline.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.