Nowadays, the efficient and probabilistic datastructure named Bloom filter is widely used in network processing applications. Moreover, the parallel nature of Bloom filter has made it suitable for multi/many-core architectures. In this paper, two schemes called multi-core architecture with shared queue and multi-core architecture with private queue (both employing Bloom filter cores) are proposed and implemented on FPGA. The inherent parallelism in querying of packets and different number of cores (such as 1, 2, 4, 8 and 16 cores) are considered. Experimental results show that the multicore architecture with private queue achieves higher throughput than the latter one. Furthermore, Bloom filter is also implemented on GPU (as many-core architecture) and the results are compared to the CPU only version. When the number of packets in GPU memory is 16384, the speedup achieved by GPU implementations using CUDA is about 274 times compared to CPU implementation. However FPGA results outperform GPU, so that the throughput of first architecture (shared queue) and second architecture (private queue) with 16 cores are respectively almost 5.5 and 7.1 times higher than GPU throughput.
Software solutions are not effective to be used in network applications because of their low throughput. By employing hardware implementation on FPGA, not only sufficient flexibility is achieved but also the throughput is increased considerably. In this paper, two multicore architectures are proposed for Bloom filter and CRC as two main network processing core functions. These architectures called multi-core architecture with shared queue and multi-core architecture with private queue. The proposed architectures are implemented for 1, 2, 4, 8 and 16 cores. Experimental results show that multi-core architecture with private queue achieves higher throughput In comparison to the other one. As compared to Bloom filter, CRC application leads to less computational load and consequently more throughput. Moreover, Bloom filter is implemented on GPU and CPU and the results are compared with each other. When number of packets in GPU memory is 16384, the speedup achieved by GPU implementations using CUDA is about 274 times compared with CPU implementations. However, FPGA results outperform GPU, so that the throughput of the first architecture (shared queue) and second architecture (private queue) with 16 cores are almost 5.5 and 7.1 times higher than GPU throughput, respectively.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.