The advent of multi-/many-core architectures demands efficient\ud
\ud
run-time supports to sustain parallel applications\ud
\ud
scalability. Synchronization mechanisms should be optimized\ud
\ud
in order to account for different scenarios, such\ud
\ud
as the interaction between threads executed on different\ud
\ud
cores as well as intra-core synchronization, i.e. involving\ud
\ud
threads executed on hardware contexts of the same core.\ud
\ud
In this perspective, we describe the design issues of two\ud
\ud
notable mechanisms for shared-memory parallel computations.\ud
\ud
We point out how specific architectural supports, like\ud
\ud
hardware cache coherence and core-to-core interconnection\ud
\ud
networks, make it possible to design optimized implementations\ud
\ud
of such mechanisms. In this paper we discuss\ud
\ud
experimental results on three representative architectures:\ud
\ud
a flagship Intel multi-core and two interesting network processors.\ud
\ud
The final result helps to untangle the complex implementation\ud
\ud
space of synchronization mechanisms.\ud
\ud
KEY WORDS\ud
\ud
Synchronization, Locking, Simultaneous Multi-Threading,\ud
\ud
Busy-Waiting, Multi-cores, Network Processors