Software solutions are not effective to be used in network applications because of their low throughput. By employing hardware implementation on FPGA, not only sufficient flexibility is achieved but also the throughput is increased considerably. In this paper, two multicore architectures are proposed for Bloom filter and CRC as two main network processing core functions. These architectures called multi-core architecture with shared queue and multi-core architecture with private queue. The proposed architectures are implemented for 1, 2, 4, 8 and 16 cores. Experimental results show that multi-core architecture with private queue achieves higher throughput In comparison to the other one. As compared to Bloom filter, CRC application leads to less computational load and consequently more throughput. Moreover, Bloom filter is implemented on GPU and CPU and the results are compared with each other. When number of packets in GPU memory is 16384, the speedup achieved by GPU implementations using CUDA is about 274 times compared with CPU implementations. However, FPGA results outperform GPU, so that the throughput of the first architecture (shared queue) and second architecture (private queue) with 16 cores are almost 5.5 and 7.1 times higher than GPU throughput, respectively.