Thaha Mohammed scite author profile

Deep neural networks (DNN) are the de-facto solution behind many intelligent applications of today, ranging from machine translation to autonomous driving. DNNs are accurate but resource-intensive, especially for embedded devices such as mobile phones and smart objects in the Internet of Things. To overcome the related resource constraints, DNN inference is generally offloaded to the edge or to the cloud. This is accomplished by partitioning the DNN and distributing computations at the two different ends. However, most of existing solutions simply split the DNN into two parts, one running locally or at the edge, and the other one in the cloud. In contrast, this article proposes a technique to divide a DNN in multiple partitions that can be processed locally by end devices or offloaded to one or multiple powerful nodes, such as in fog networks. The proposed scheme includes both an adaptive DNN partitioning scheme and a distributed algorithm to offload computations based on a matching game approach. Results obtained by using a selfdriving car dataset and several DNN benchmarks show that the proposed solution significantly reduces the total latency for DNN inference compared to other distributed approaches and is 2.6 to 4.2 times faster than the state of the art. Index Terms-DNN inference, task partitioning, task offloading, distributed algorithm, matching game.

show abstract

UbiPriSEQ—Deep Reinforcement Learning to Manage Privacy, Security, Energy, and QoS in 5G IoT HetNets

Mohammed

Albeshri

Katib

et al. 2020

Applied Sciences

View full text Add to dashboard Cite

5G networks and Internet of Things (IoT) offer a powerful platform for ubiquitous environments with their ubiquitous sensing, high speeds and other benefits. The data, analytics, and other computations need to be optimally moved and placed in these environments, dynamically, such that energy-efficiency and QoS demands are best satisfied. A particular challenge in this context is to preserve privacy and security while delivering quality of service (QoS) and energy-efficiency. Many works have tried to address these challenges but without a focus on optimizing all of them and assuming fixed models of environments and security threats. This paper proposes the UbiPriSEQ framework that uses Deep Reinforcement Learning (DRL) to adaptively, dynamically, and holistically optimize QoS, energy-efficiency, security, and privacy. UbiPriSEQ is built on a three-layered model and comprises two modules. UbiPriSEQ devises policies and makes decisions related to important parameters including local processing and offloading rates for data and computations, radio channel states, transmit power, task priority, and selection of fog nodes for offloading, data migration, and so forth. UbiPriSEQ is implemented in Python over the TensorFlow platform and is evaluated using a real-life application in terms of SINR, privacy metric, latency, and utility function, manifesting great promise.

show abstract

DIESEL: A novel deep learning-based tool for SpMV computations and solving sparse linear equation systems

et al. 2020

View full text Add to dashboard Cite

Sparse linear algebra is central to many areas of engineering, science and business. The community has done considerable work on proposing new methods for sparse matrix-vector multiplication (SpMV) computations and iterative sparse solvers on graphical processing units (GPUs). Due to vast variations in matrix features, no single method performs well across all sparse matrices. A few tools on automatic prediction of best-performing SpMV kernels have emerged recently and require many more efforts to fully utilize their potential. The utilization of a GPU by the existing SpMV kernels is far from its full capacity. Moreover, the development and performance analysis of SpMV techniques on GPUs have not been studied in sufficient depth. This paper proposes DIESEL, a deep learning-based tool that predicts and executes the best performing SpMV kernel for a given matrix using a feature set carefully devised by us through rigorous empirical and mathematical instruments. The dataset comprises 1056 matrices from 26 different real-life application domains including computational fluid dynamics, materials, electromagnetics, economics, and more. We propose a range of new metrics and methods for performance analysis, visualization and comparison of SpMV tools. DIESEL provides better performance with its accuracy 88.2%, workload accuracy 91.96%, and average relative loss 4.4%, compared to 85.9%, 85.31%, and 7.65% by the next best performing artificial Intelligence (AI) based SpMV tool. The extensive results and analyses presented in this paper provide several key insights into the performance of the SpMV tools and how these relate to the matrix datasets and the performance metrics, allowing the community to further improve and compare basic and AI-based SpMV tools in the future.

show abstract

Performance Analysis of Sparse Matrix-Vector Multiplication (SpMV) on Graphics Processing Units (GPUs)

et al. 2020

View full text Add to dashboard Cite

Graphics processing units (GPUs) have delivered a remarkable performance for a variety of high performance computing (HPC) applications through massive parallelism. One such application is sparse matrix-vector (SpMV) computations, which is central to many scientific, engineering, and other applications including machine learning. No single SpMV storage or computation scheme provides consistent and sufficiently high performance for all matrices due to their varying sparsity patterns. An extensive literature review reveals that the performance of SpMV techniques on GPUs has not been studied in sufficient detail. In this paper, we provide a detailed performance analysis of SpMV performance on GPUs using four notable sparse matrix storage schemes (compressed sparse row (CSR), ELLAPCK (ELL), hybrid ELL/COO (HYB), and compressed sparse row 5 (CSR5)), five performance metrics (execution time, giga floating point operations per second (GFLOPS), achieved occupancy, instructions per warp, and warp execution efficiency), five matrix sparsity features (nnz, anpr, nprvariance, maxnpr, and distavg), and 17 sparse matrices from 10 application domains (chemical simulations, computational fluid dynamics (CFD), electromagnetics, linear programming, economics, etc.). Subsequently, based on the deeper insights gained through the detailed performance analysis, we propose a technique called the heterogeneous CPU–GPU Hybrid (HCGHYB) scheme. It utilizes both the CPU and GPU in parallel and provides better performance over the HYB format by an average speedup of 1.7x. Heterogeneous computing is an important direction for SpMV and other application areas. Moreover, to the best of our knowledge, this is the first work where the SpMV performance on GPUs has been discussed in such depth. We believe that this work on SpMV performance analysis and the heterogeneous scheme will open up many new directions and improvements for the SpMV computing field in the future.

show abstract

A Global Brain fuelled by Local intelligence: Optimizing Mobile Services and Networks with AI

Naas

Mohammed

Sigg

2020

View full text Add to dashboard Cite

Artificial intelligence (AI) is among the most influential technologies to improve daily lives and to promote further economic activities. Recently, a distributed intelligence, referred to as a global brain, has been developed to optimize mobile services and their respective delivery networks. Inspired by interconnected neuron clusters in the human nervous system, it is an architecture interconnecting various AI entities. This paper models the global brain architecture and communication among its components based on multi-agent system technology and graph theory. We target two possible scenarios for communication and propose an optimized communication algorithm. Extensive experimental evaluations using the Java Agent Development Framework (JADE), reveal the performance of the global brain based on optimized communication in terms of network complexity, network load, and the number of exchanged messages. We adapt activity recognition as a real-world problem and show the efficiency of the proposed architecture and communication mechanism based on system accuracy and energy consumption as compared to centralized learning, using a real testbed comprised of NVIDIA Jetson Nanos. Finally, we discuss emerging technologies to foster future global brain machinelearning tasks, such as voice recognition, image processing, natural language processing, and big data processing.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Thaha Mohammed

Distributed Inference Acceleration with Adaptive DNN Partitioning and Offloading

UbiPriSEQ—Deep Reinforcement Learning to Manage Privacy, Security, Energy, and QoS in 5G IoT HetNets

DIESEL: A novel deep learning-based tool for SpMV computations and solving sparse linear equation systems

Performance Analysis of Sparse Matrix-Vector Multiplication (SpMV) on Graphics Processing Units (GPUs)

A Global Brain fuelled by Local intelligence: Optimizing Mobile Services and Networks with AI

Contact Info

Product

Resources

About