Recent technology trends have indicated that, although device sizes will continue to scale as they have in the past, supply voltage scaling has ended. As a result, future chips can no longer rely on simply increasing the operational core count to improve performance without surpassing a reasonable power budget. Alternatively, allocating die area towards accelerators targeting an application, or an application domain, appears quite promising, and this paper makes an argument for a neural network hardware accelerator. After being hyped in the 1990s, then fading away for almost two decades, there is a surge of interest in hardware neural networks because of their energy and fault-tolerance properties. At the same time, the emergence of high-performance applications like Recognition, Mining, and Synthesis (RMS) suggest that the potential application scope of a hardware neural network accelerator would be broad. In this paper, we want to highlight that a hardware neural network accelerator is indeed compatible with many of the emerging high-performance workloads, currently accepted as benchmarks for high-performance micro-architectures. For that purpose, we develop and evaluate software neural network implementations of 5 (out of 12) RMS applications from the PARSEC Benchmark Suite. Our results show that neural network implementations can achieve competitive results, with respect to application-specific quality metrics, on these 5 RMS applications.
Neural network simulations on a parallel architecture are reported. The architecture is scalable and flexible enough to be useful for simulating various kinds of networks and paradigms. The computing device is based on an existing coarse-grain parallel framework (INMOS transputers), improved with finer-grain parallel abilities through VLSI chips, and is called the Lneuro 1.0 (for LEP neuromimetic) circuit. The modular architecture of the circuit makes it possible to build various kinds of boards to match the expected range of applications or to increase the power of the system by adding more hardware. The resulting machine remains reconfigurable to accommodate a specific problem to some extent. A small-scale machine has been realized using 16 Lneuros, to experimentally test the behavior of this architecture. Results are presented on an integer version of Kohonen feature maps. The speedup factor increases regularly with the number of clusters involved (to a factor of 80). Some ways to improve this family of neural network simulation machines are also investigated.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.