In this paper, we present three different structures, namely the transposition-free structure, the folded structure and the pipeline structure for 2-D discrete Hadamard transform (DHT). The transposition-free structure and pipeline structure produce one column of output during each clock cycle, while the folded structure requires two clock cycles for that. The folded structure uses one 1-D DHT module for both row and column processing, while the pipeline structure processes rows and columns concurrently using two separate 1-D DHT modules. Interestingly, the transposition-unit of the pipeline structure involves nearly the same number of registers as the folded structure, and offers twice the throughput of the other. The transposition-free structure is less area-time efficient than pipeline structure due to its relatively less efficient serial-output processors. ASIC synthesis result shows that the pipeline structure involves 47.4% less area-delay product (ADP) and 53.74% less energy per sample (EPS) than the folded structure, and involves slightly less ADP and consumes 31.67% less EPS than the transposition-free design.