The Smith-Waterman (SW) algorithm based on dynamic programming is a well-known classical method for high precision sequence matching and has become the gold standard to evaluate sequence alignment software. In this paper, we propose fine-grained parallelized SW algorithms using affine gap penalty and implement a parallel computing structures to accelerating the SW with backtracking on FPGA platform. We analysis the dynamic parallel computing features of anti-diagonal elements and storage expansion problem resulting from backtracking stage, and propose a series of optimization strategies to eliminate data dependency, reduce storage requirements, and overlap memory access latency. Our implementation is capable of supporting multi-type, large-scale biological sequence alignment applications. We obtain a speedup between 3.6 and 25.2 over the typical SW algorithm running on a general-purpose computer configured with an Intel Core i5 3.2 GHz CPU. Moreover, our work is superior to other FPGA implementations in both array size and clock frequency, and the experiment results show that it can get a performance closed to that of the latest GPU implementation, but the power consumption is only about 26% of that of the GPU platforms.