A VLSI architecture for the generalized bit-flipping decoding algorithm for non-binary low-density parity-check codes is proposed in this paper. The tentative decoding steps of the algorithm have been modified to avoid computing and storing a matrix of dimension N × 2 q , for a code (N, K) over GF(2 q ), reducing its complexity with a minimal penalization of its performance, less than 0.05 dB compared with the original algorithm. The architecture was synthesized using a 90 nm standard cell library, for the (837, 723) non-binary code over GF(2 5 ), requiring 590220 xor gates and achieving a throughput of 89 Mbps. Additionally, it was implemented in a Virtex-VI FPGA device with a cost of 4070 slices and a throughput of 44.6 Mbps.