This article deals with communication performance of a multiprocessor system implemented using award-wining BCM 1480 multi-core chips. Our system uses high-performance HyperTransport links to interconnect constituent chips, realizing cache-coherent non-uniform memory access. It takes advantage of hardware support from the BCM 1480 chip to attain very impressive communication performance among constituent BCM 1480 chips. This is achieved via an extension to global memory, so that small messages can be pushed quickly across chips in less than one us by the CPU cores through DMA to achieve zero-copy message buffering. It eliminates all overhead associated with the kernel and protocol processing for the utmost interconnect bandwidth in data transfers.