Abstract-Many numerical optimisation problems rely on fast algorithms for solving sparse triangular systems of linear equations (STLs). To accelerate the solution of such equations, two types of approaches have been used: on GPUs, concurrency has been prioritised to the disadvantage of data locality, while on multi-core CPUs, data locality has been prioritised to the disadvantage of concurrency.In this paper, we discuss the interaction between data locality and concurrency in the solution of STLs on GPUs, and we present a new algorithm that balances both. We demonstrate empirically that, subject to there being enough concurrency available in the input matrix, our algorithm outperforms Nvidia's concurrencyprioritising CUSPARSE algorithm for GPUs. Experimental results show a maximum speedup of 5.8-fold.Our solution algorithm, which we have implemented in OpenCL, requires a pre-processing phase that partitions the graph associated with the input matrix into sub-graphs, whose data can be stored in low-latency local memories. This preliminary analysis phase is expensive, but because it depends only on the input matrix, its cost can be amortised when solving for many different right-hand sides.
This paper presents a new Ubiquitous Sensor Network (USN) Architecture to be used in developing countries and reveals its usefulness by highlighting some of its key features. In complement to a previous ITU proposal, our architecture referred to as “Ubiquitous Sensor Network for Development (USN4D)” integrates in its layers features such as opportunistic data dissemination, long distance deployment and localisation of information to meet the requirements of the developing world. Besides describing some of the most important requirements for the sensor equipment to be used in a USN4D setting, we present the main features and experiments conducted using the “WaspNet” as one of the wireless sensor deployment platforms that meets these requirements. Furthermore, building upon “WaspNet” platform, we present an application to Air pollution Monitoring in the city of Cape Town, in South Africa as one of the first steps towards building community wireless sensor networks (CSN) in the developing world using off-the-shelf sensor equipment.
This paper provides a novel way of trading increased resource utilisation for decreased latency when computing a single Discrete Fourier Transform on the FPGA. Analysis conducted on the Cooley-Tukey FFT optimisation shows that it increases the number of operations in the critical path of the transform computation. Consequentially an algorithm is proposed which allows control over the degree to which the Cooley-Tukey optimisation is utilised, trading between resource utilisation and absolute latency. The resource utilisation and latency results for the MyHDL implementation of the proposed algorithm upon the Rhino platform are provided which demonstrate that a practical Pareto curve has been established for a variety of dataset sizes. This implementation is also compared to Xilinx's FFT IP core, providing 14% better latency performance than the manufacturer's implementation albeit at a greater resource cost.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.