Hybrid Transactional Memory (TM) uses available hardware TM resources to execute language-level transactions, and falls back to a software TM implementation for those transactions that cannot complete in hardware. Ideally, a hybrid TM would allow hardware and software transactions to run concurrently, but would not waste hardware TM resources on coordination between the two classes of transactions. In addition, it should scale well, incur little latency, offer strong safety guarantees, and provide some degree of fairness. We introduce a new hybrid TM algorithm, "Hybrid Cohorts", in which hardware transactions do not modify global metadata, and software transactions have extremely low per-access overhead. The tradeoff is that hardware transactions cannot commit while software transactions are in flight. Evaluation on an 8-thread Intel Haswell CPU shows competitive performance with the current state-of-the-art. Furthermore, it does so while providing acceptable levels of fairness and safety, and offering opportunities for hardware acceleration. 2 The Hybrid Cohorts Approach HyTM algorithms that descend from NOrec share two key properties: First, the opacity of each software transaction (STx) is preserved by requiring every hardware transac
Time-based transactional memories typically rely on a shared memory counter to ensure consistency. Unfortunately, such a counter can become a bottleneck. In this article, we identify properties of hardware cycle counters that allow their use in place of a shared memory counter. We then devise algorithms that exploit the x86 cycle counter to enable bottleneck-free transactional memory runtime systems. We also consider the impact of privatization safety and hardware ordering constraints on the correctness, performance, and generality of our algorithms.
Language-level transactions are said to provide "atomicity," implying that the order of operations within a transaction should be invisible to concurrent transactions and thus that independent operations within a transaction should be safe to execute in any order. In this article, we present a mechanism for dynamically reordering memory operations within a transaction so that read-modify-write operations on highly contended locations can be delayed until the very end of the transaction. When integrated with traditional transactional conflict detection mechanisms, our approach reduces aborts on hot memory locations, such as statistics counters, thereby improving throughput and reducing wasted work. We present three algorithms for delaying highly contended read-modify-write operations within transactions, and we evaluate their impact on throughput for eager and lazy transactional systems across multiple workloads. We also discuss complications that arise from the interaction between our mechanism and the need for strong language-level semantics, and we propose algorithmic extensions that prevent errors from occurring when accesses are aggressively reordered in a transactional memory implementation with weak semantics.
Time-based transactional memories typically rely on a shared memory counter to ensure consistency. Unfortunately, such a counter can become a bottleneck. In this article, we identify properties of hardware cycle counters that allow their use in place of a shared memory counter. We then devise algorithms that exploit the x86 cycle counter to enable bottleneck-free transactional memory runtime systems. We also consider the impact of privatization safety and hardware ordering constraints on the correctness, performance, and generality of our algorithms.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.