Most of the research on deep neural networks (DNNs) so far has been focused on obtaining higher accuracy levels by building increasingly large and deep architectures. Training and evaluating these models is only feasible when large amounts of resources such as processing power and memory are available. Typical applications that could benefit from these models are however executed on resource constrained devices. Mobile devices such as smartphones already use deep learning techniques but they often have to perform all processing on a remote cloud. We propose a new architecture called a Cascading network that is capable of distributing a deep neural network between a local device and the cloud while keeping the required communication network traffic to a minimum. The network begins processing on the constrained device and only relies on the remote part when the local part does not provide an accurate enough result. The Cascading network allows for an early stopping mechanism during the recall phase of the network. We evaluated our approach in an Internet Of Things (IoT) context where a deep neural network adds intelligence to a large amount of heterogeneous connected devices. This technique enables a whole variety of autonomous systems where sensors, actuators and computing nodes can work together. We show that the Cascading architecture allows for a substantial improvement in evaluation speed on constrained devices while the loss in accuracy is kept to a minimum.
Abstract-Deep neural networks are the state of the art technique for a wide variety of classification problems. Although deeper networks are able to make more accurate classifications, the value brought by an additional hidden layer diminishes rapidly. Even shallow networks are able to achieve relatively good results on various classification problems. Only for a small subset of the samples do the deeper layers make a significant difference. We describe an architecture in which only the samples that can not be classified with a sufficient confidence by a shallow network have to be processed by the deeper layers. Instead of training a network with one output layer at the end of the network, we train several output layers, one for each hidden layer. When an output layer is sufficiently confident in this result, we stop propagating at this layer and the deeper layers need not be evaluated. The choice of a threshold confidence value allows us to trade-off accuracy and speed. Applied in the Internet-of-things (IoT) context, this approach makes it possible to distribute the layers of a neural network between low powered devices and powerful servers in the cloud. We only need the remote layers when the local layers are unable to make an accurate classification. Such an architecture adds the intelligence of a deep neural network to resource constrained devices such as sensor nodes and various IoT devices. We evaluated our approach on the MNIST and CIFAR10 datasets. On the MNIST dataset, we retain the same accuracy at half the computational cost. On the more difficult CIFAR10 dataset we were able to obtain a relative speed-up of 33% at an marginal increase in error rate from 15.3% to 15.8%.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations鈥揷itations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright 漏 2024 scite LLC. All rights reserved.
Made with 馃挋 for researchers
Part of the Research Solutions Family.