Multicast is an important traffic mode that runs on multi-core systems, and an efficient hardware support for multicast can greatly improve the performance of the whole system. Most multicast solutions use the dimension-order routing to generate the mutlicast trees, which are neither bandwidth nor power efficient. This article presents a synthesizable router for network-on-chip (NoC) which supports arbitrarily shaped multicast path based on a mesh topology. In our scheme, incremental setup is adopted to simplify the process of multicast tree construction. For each sub-path setup, we present a novel scheme called two period sub-path setup (TPSS). TPSS is divided into two periods: routing to a predeterminate intermediate router, and updating lookup tables from the intermediate router to destination. This novel setup makes it feasible to support arbitrarily shaped path setup. In our case study, Optimized tree algorithm (OPT) and Left-XY-Right-Optimized tree algorithm (LXYROPT) are proposed for power-efficient path searching, but they need to be pre-configured for the reason of high computation cost. Moreover, Virtual Circuit Tree Multicasting (VCTM) is also supported in our scheme for dynamic construction of multicast path, which needs no computation in path searching. The performance is evaluated by using a cycle accurate simulator developed in SystemC, and the hardware overhead is estimated by using a synthesizable HDL model. Compared to VCTM (without FIFO, multicast table and network adapter), the area overhead of implementing our router is negligible (less than 0.5%). Index Terms-Network-on-Chip, System-on-Chip, Multicast I. INTRODUCTION Many-core architectures have become the mainstream for designing System-on-Chip. Efficient communication among the cores is key to the performance of the whole system. The traditional bus structure works efficiently in systems with limited amount of cores. For MPSoCs with a large number of cores, increased contentions over buses lead to poor performance. The concept of Network-on-Chip (NoC) has emerged as a scalable solution to the global interconnection problem of these systems. Various NoCs have been developed, such as NOSTRUM [1], Some parts of this material appeared in proceedings of the 16th Asia and South Pacific Design Automation Conference (ASP-DAC2011) held in Yokohama, Japan, between January 25th and 29th, 2011[W. Hu et.al. 2011]. This paper significantly improves over the ASP-DAC 2011 paper by expanding the validation section, implementing the router in HDL, including a new section (Section VI) that shows the multicast setup using existing path.