The delay of the multiplier plays a critical role in many high-speed implementations and processors such as RISC, DSP, and image processing cores, etc. In this paper, a design of unsigned 32-bit multiplier is proposed, aiming to achieve the best timing performance with an appropriate area. The proposed architecture consists of a modified Radix-4 Booth encoder, a modified Wallace Tree adder, and a Carry Look Ahead adder. The design has been verified successfully on DE2-115 and then synthesized to ASIC implementation. The FPGAbased experimental result shows that it has the resources of 1788 ALUTs. The synthesized result occupies an area of 58.28 mm 2 with 4.13 ns total delay (i.e. 242.13MHz maximum frequency).