Panagiota Fatourou scite author profile

Fine-grain thread synchronization has been proved, in several cases, to be outperformed by efficient implementations of the combining technique where a single thread, called the combiner , holding a coarse-grain lock, serves, in addition to its own synchronization request, active requests announced by other threads while they are waiting by performing some form of spinning. Efficient implementations of this technique significantly reduce the cost of synchronization, so in many cases they exhibit much better performance than the most efficient finely synchronized algorithms. In this paper, we revisit the combining technique with the goal to discover where its real performance power resides and whether or how ensuring some desired properties (e.g., fairness in serving requests) would impact performance. We do so by presenting two new implementations of this technique; the first (CC-Synch) addresses systems that support coherent caches, whereas the second (DSM-Synch) works better in cacheless NUMA machines. In comparison to previous such implementations, the new implementations (1) provide bounds on the number of remote memory references (RMRs) that they perform, (2) support a stronger notion of fairness, and (3) use simpler and less basic primitives than previous approaches. In all our experiments, the new implementations outperform by far all previous state-of-the-art combining-based and fine-grain synchronization algorithms. Our experimental analysis sheds light to the questions we aimed to answer. Several modern multi-core systems organize the cores into clusters and provide fast communication within the same cluster and much slower communication across clusters. We present an hierarchical version of CC-Synch, called H-Synch which exploits the hierarchical communication nature of such systems to achieve better performance. Experiments show that H-Synch significantly outper forms previous state-of-the-art hierarchical approaches. We provide new implementations of common shared data structures (like stacks and queues) based on CC-Synch, DSM-Synch and H-Synch. Our experiments show that these implementations outperform by far all previous (fine-grain or combined-based) implementations of shared stacks and queues.

show abstract

A highly-efficient wait-free universal construction

Fatourou

Kallimanis

2011

101

113

View full text Add to dashboard Cite

We present a new simple wait-free universal construction, called Sim, that uses just a Fetch&Add and an LL/SC object and performs a constant number of shared memory accesses. We have implemented Sim in a real shared-memory machine. In theory terms, our practical version of Sim, called P-Sim, has worse complexity than its theoretical analog; in practice though, we experimentally show that P-Sim outperforms several state-of-the-art lock-based and lock-free techniques, and this given that it is wait-free, i.e., that it satisfies a stronger progress condition than all the algorithms it outperforms.We have used P-Sim to get highly-efficient wait-free implementations of stacks and queues. Our experiments show that our implementations outperform the currently stateof-the-art shared stack and queue implementations which ensure only weaker progress properties than wait-freedom.

show abstract

Highly-Efficient Wait-Free Synchronization

Fatourou

Kallimanis

2013

Theory Comput Syst

View full text Add to dashboard Cite

ParIS: The Next Destination for Fast Data Series Indexing and Query Answering

Peng

Fatourou

Palpanas

2018

View full text Add to dashboard Cite

The amortized complexity of non-blocking binary search trees

Ellen

Fatourou

Helga

et al. 2014

View full text Add to dashboard Cite

We improve upon an existing non-blocking implementation of a binary search tree from single-word compare-and-swap instructions. We show that the worst-case amortized step complexity of performing a Find, Insert or Delete operation op on the tree is O(h(op) +ċ(op)) where h(op) is the height of the tree at the beginning of op andċ(op) is the maximum number of operations accessing the tree at any one time during op. This is the first bound on the complexity of a non-blocking implementation of a search tree.

show abstract

Constant-time snapshots with applications to concurrent data structures

Wei

Ben-David

Blelloch

et al. 2021

View full text Add to dashboard Cite

The RedBlue Adaptive Universal Constructions

Fatourou

Kallimanis

2009

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.