International audienceIn transactional memory, aborted transactions reduce performance, and waste computing resources. Ideally, concurrent execution of transactions should be optimally ordered to minimise aborts, but such an ordering is often either complex, or unfeasible, to obtain. This paper introduces a new technique called steal-on-abort, which aims to improve transaction ordering at runtime. Suppose transactions A and B conflict, and B is aborted. In general it is difficult to predict this first conflict, but once observed, it is logical not to execute the two transactions concurrently again. In steal-on-abort, the aborted transaction B is stolen by its opponent transaction A, and queued behind A to prevent concurrent execution of A and B. Without steal-on-abort, transaction B would typically have been restarted immediately, and possibly had a repeat conflict with transaction A. Steal-on-abort requires no application-specific information, modification, or offline pre-processing. In this paper, it is evaluated using a sorted linked list, red-black tree, STAMP-vacation, and Lee-TM. The evaluation reveals steal-on-abort is highly effective at eliminating repeat conflicts, which reduces the amount of computing resources wasted, and significantly improves performance
While Transactional Memory (TM) research on sharedmemory chip multiprocessors has been flourishing over the last years, limited research has been conducted in the cluster domain. In this paper, we introduce a research platform for exploiting software TM on clusters. The Distributed Software Transactional Memory (DiSTM) system has been designed for easy prototyping of TM coherence protocols and it does not rely on a software or hardware implementation of distributed shared memory. Three TM coherence protocols have been implemented and evaluated with established TM benchmarks. The decentralized Transactional Coherence and Consistency protocol has been compared against two centralized protocols that utilize leases. Results indicate that depending on network congestion and amount of contention different protocols perform better.
System designers typically use well-studied benchmarks to evaluate and improve new architectures and compilers. We design tomorrow's systems based on yesterday's applications. In this paper we investigate an emerging application, 3D scene understanding, likely to be signi cant in the mobile space in the near future. Until now, this application could only run in real-time on desktop GPUs. In this work, we examine how it can be mapped to power constrained embedded systems. Key to our approach is the idea of incremental co-design exploration, where optimization choices that concern the domain layer are incrementally explored together with low-level compiler and architecture choices. The goal of this exploration is to reduce execution time while minimizing power and meeting our quality of result objective. As the design space is too large to exhaustively evaluate, we use active learning based on a random forest predictor to nd good designs. We show that our approach can, for the rst time, achieve dense 3D mapping and tracking in the real-time range within a 1W power budget on a popular embedded device. This is a 4.8x execution time improvement and a 2.8x power reduction compared to the state-of-the-art
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.