Compared with the Viterbi algorithm, the stack algorithm can provide lower hardware complexity, especially for long constraint length convolutional codes. This paper proposes a fast and simple hardware stack sequence decoder with an efficient state scheme. The stack decoder structure is mainly composed of RAM and shift register, and three independent RAM parts store the path metric, node, and encoder state of each path. Accessing different data items of the same stack in the data structure can be achieved by addressing the RAM with the same register value. In the decoding process, the paths are sorted according to the rules of the stack algorithm, and the path located at the top of the stack will execute the state of path extension in the next clock cycle. In this paper, an FPGA prototype of the stack decoder is constructed, and high-speed decoding is obtained by optimizing the state scheme, avoiding additional time-consuming read/write operations.