On the problem that the hardware overhead of hardware implementation architecture for discrete wavelet transform wastes a lot, on the basis of flipping structure, we propose a high performance hardware implementation architecture. The architecture merges the lifting step and adopts the pipelined design to adjust the primitive data path. The proposed 2-D DWT architecture consists of four parts: column filter module, 2×2 transposing module, row filter module and scaling module. The column filter and row filter process simultaneously. The 2×2 transposing module makes it true that several registers substitute a lot of intermediate transposing memory. The architecture introduces 4 to 1 multiplexer into scaling module. Experimental results show that the proposed architecture, under the tight critical path, can efficiently reduce the hardware overhead and save the hardware power.