Non-Uniform Memory Access (NUMA) architectures make it possible to build large-scale shared memory multiprocessor systems in comparison with non-scalable Uniform Memory Access (UMA) architectures. Most NUMA multiprocessor operations such as scheduling and synchronizing processes, accessing data from processors to memory models and allocating distributed memory space to dierent processors, are performed through interconnection networks such as a multistage switching network. The eciency of these basic operations determines the parallel processing performance on a NUMA multiprocessor. This paper presents several analytical models to predict and evaluate the overhead of interprocessor communication, process scheduling, process synchronization and remote memory access where network contention and memory contention are considered. Performance measurements to support the models and analyses through several numerical examples have been done on the BBN GP1000, a NUMA shared memory multiprocessor. Both analytical and experimental results give a comprehensive and clear understanding of the various eects, which are important for the eective use of a NUMA shared memory multiprocessor. The results in this paper may be used to determine optimal strategies in developing an ecient programming environment for a NUMA system.