Parallel workstations, each comprising a 10-100 processor shared memory machine, promise cost-e ective general-purpose multiprocessing. This thesis explores the coupling of such small-to medium-scale shared memory multiprocessors through software over a local area network to synthesize larger shared memory systems. Multiprocessors built in this fashion are called Distributed Scalable Shared memory Multiprocessors DSSMPs .The challenge of building DSSMPs lies in seamlessly extending hardware-supported shared memory of each parallel workstation to span a cluster of parallel workstations using software only. Such a shared memory system is called Multigrain Shared Memory because it naturally supports two grains of sharing: ne-grain cache-line sharing within each parallel workstation, and coarse-grain page sharing across parallel workstations. Applications that can leverage the e cient ne-grain support for shared memory provided by each parallel workstation have the potential for high performance.This thesis makes three contributions in the context of Multigrain Shared Memory. First, it provides the design of a multigrain shared memory system, called MGS, and demonstrates its feasibility and correctness via an implementation on a 32-processor Alewife machine. Second, this thesis undertakes an in-depth application study that quanti es the extent to which shared memory applications can leverage e cient shared memory mechanisms provided by DSSMPs. The thesis begins by looking at the performance of unmodi ed shared memory programs, and then investigates application transformations that improve performance. Finally, this thesis presents an approach called Synchronization Analysis for analyzing the performance of multigrain shared memory systems. The thesis develops a performance model based on Synchronization Analysis, and uses the model to study DSSMPs with up to 512 processors. The experiments and analysis demonstrate that scalable DSSMPs can beconstructed from small-scale workstation nodes to achieve competitive performance with large-scale all-hardware shared memory systems. For instance, the model predicts that a 256-processor DSSMP built from 16-processor parallel workstation nodes achieves equivalent performance to a 128-processor all-hardware multiprocessor on a communication-intensive w orkload.