Mikel Luján scite author profile

KernelsCompiler/Runtime Hardware Architectures Correctness Performance Metrics bilateralFilter (..) halfSampleRobust (..) renderVolume (..) integrate (..) : : Frame rate Accuracy Energy Computer Vision ICL-NUIM Dataset Fig. 1: The SLAMBench framework makes it possible for experts coming from the computer vision, compiler, run-time, and hardware communities to cooperate in a unified way to tackle algorithmic and implementation alternatives.Abstract-Real-time dense computer vision and SLAM offer great potential for a new level of scene modelling, tracking and real environmental interaction for many types of robot, but their high computational requirements mean that use on mass market embedded platforms is challenging. Meanwhile, trends in low-cost, low-power processing are towards massive parallelism and heterogeneity, making it difficult for robotics and vision researchers to implement their algorithms in a performance-portable way. In this paper we introduce SLAMBench, a publicly-available software framework which represents a starting point for quantitative, comparable and validatable experimental research to investigate trade-offs in performance, accuracy and energy consumption of a dense RGB-D SLAM system. SLAMBench provides a KinectFusion implementation in C++, OpenMP, OpenCL and CUDA, and harnesses the ICL-NUIM dataset of synthetic RGB-D sequences with trajectory and scene ground truth for reliable accuracy comparison of different implementation and algorithms. We present an analysis and breakdown of the constituent algorithmic elements of KinectFusion, and experimentally investigate their execution time on a variety of multicore and GPUaccelerated platforms. For a popular embedded platform, we also present an analysis of energy efficiency for different configuration alternatives.

show abstract

Steal-on-Abort: Improving Transactional Memory Performance through Dynamic Transaction Reordering

Ansari

Luján

Kotselidis

et al. 2009

View full text Add to dashboard Cite

International audienceIn transactional memory, aborted transactions reduce performance, and waste computing resources. Ideally, concurrent execution of transactions should be optimally ordered to minimise aborts, but such an ordering is often either complex, or unfeasible, to obtain. This paper introduces a new technique called steal-on-abort, which aims to improve transaction ordering at runtime. Suppose transactions A and B conflict, and B is aborted. In general it is difficult to predict this first conflict, but once observed, it is logical not to execute the two transactions concurrently again. In steal-on-abort, the aborted transaction B is stolen by its opponent transaction A, and queued behind A to prevent concurrent execution of A and B. Without steal-on-abort, transaction B would typically have been restarted immediately, and possibly had a repeat conflict with transaction A. Steal-on-abort requires no application-specific information, modification, or offline pre-processing. In this paper, it is evaluated using a sorted linked list, red-black tree, STAMP-vacation, and Lee-TM. The evaluation reveals steal-on-abort is highly effective at eliminating repeat conflicts, which reduces the amount of computing resources wasted, and significantly improves performance

show abstract

DiSTM: A Software Transactional Memory Framework for Clusters

Kotselidis

Ansari

Jarvis

et al. 2008

View full text Add to dashboard Cite

While Transactional Memory (TM) research on sharedmemory chip multiprocessors has been flourishing over the last years, limited research has been conducted in the cluster domain. In this paper, we introduce a research platform for exploiting software TM on clusters. The Distributed Software Transactional Memory (DiSTM) system has been designed for easy prototyping of TM coherence protocols and it does not rely on a software or hardware implementation of distributed shared memory. Three TM coherence protocols have been implemented and evaluated with established TM benchmarks. The decentralized Transactional Coherence and Consistency protocol has been compared against two centralized protocols that utilize leases. Results indicate that depending on network congestion and amount of contention different protocols perform better.

show abstract

A Survey on Optical Network-on-Chip Architectures

2017

View full text Add to dashboard Cite

show abstract

Integrating Algorithmic Parameters into Benchmarking and Design Space Exploration in 3D Scene Understanding

Bodin

Nardi

Muhammad

et al. 2016

View full text Add to dashboard Cite

System designers typically use well-studied benchmarks to evaluate and improve new architectures and compilers. We design tomorrow's systems based on yesterday's applications. In this paper we investigate an emerging application, 3D scene understanding, likely to be signi cant in the mobile space in the near future. Until now, this application could only run in real-time on desktop GPUs. In this work, we examine how it can be mapped to power constrained embedded systems. Key to our approach is the idea of incremental co-design exploration, where optimization choices that concern the domain layer are incrementally explored together with low-level compiler and architecture choices. The goal of this exploration is to reduce execution time while minimizing power and meeting our quality of result objective. As the design space is too large to exhaustively evaluate, we use active learning based on a random forest predictor to nd good designs. We show that our approach can, for the rst time, achieve dense 3D mapping and tracking in the real-time range within a 1W power budget on a popular embedded device. This is a 4.8x execution time improvement and a 2.8x power reduction compared to the state-of-the-art

show abstract

Parallel Hardware Merge Sorter

Song

Koch

Luján

et al. 2016

View full text Add to dashboard Cite

Advanced Concurrency Control for Transactional Memory Using Transaction Commit Rate

Ansari

Kotselidis

Jarvis

et al. 2008

View full text Add to dashboard Cite

Understanding the interconnection network of SpiNNaker

Navaridas

Luján

Miguel-Alonso

et al. 2009

View full text Add to dashboard Cite

SpiNNaker is a massively parallel architecture designed to model large-scale spiking neural networks in (biological) real-time. Its design is based around ad-hoc multi-core System-on-Chips which are interconnected using a two-dimensional toroidal triangular mesh. Neurons are modeled in software and their spikes generate packets that propagate through the on-and inter-chip communication fabric relying on custom-made on-chip multicast routers. This paper models and evaluates large-scale instances of its novel interconnect (more than 65 thousand nodes, or over one million computing cores), focusing on real-time features and fault-tolerance. The key contribution can be summarized as understanding the properties of the feasible topologies and establishing the stable operation of the SpiNNaker under different levels of degradation. First we derive analytically the topological characteristics of the network, which are later confirmed by experimental work. With the computational model developed, we investigate the topology of SpiNNaker, and compare it with a standard 3-dimensional torus. The novel emergency routing mechanism, implemented within the routers, allows the topology of SpiNNaker to be more robust than the 3-dimensional torus, regardless of the latter having better topological characteristics. Furthermore, we obtain optimal values of two router parameters related with livelock and deadlock avoidance mechanisms.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

334 Leonard St

Brooklyn, NY 11211

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Mikel Luján

Introducing SLAMBench, a performance and accuracy benchmarking methodology for SLAM

Steal-on-Abort: Improving Transactional Memory Performance through Dynamic Transaction Reordering

DiSTM: A Software Transactional Memory Framework for Clusters

A Survey on Optical Network-on-Chip Architectures

Integrating Algorithmic Parameters into Benchmarking and Design Space Exploration in 3D Scene Understanding

Parallel Hardware Merge Sorter

Advanced Concurrency Control for Transactional Memory Using Transaction Commit Rate

Understanding the interconnection network of SpiNNaker

Contact Info

Product

Resources

About