The practical deployment of machine vision presents particular challenges for resource constrained edge devices. With a clear need to execute multiple tasks with variable workloads, there is a need for a robust approach that can dynamically adapt at runtime and which can maintain the maximum quality of service (QoS) within the available resource constraints. A lightweight approach that monitors the runtime workload constraints and leverages accuracy-throughput trade-offs on a graphics processing unit (GPU), is presented. It includes optimisation techniques that identify the configurations for each task in terms of optimal accuracy, energy and memory and management of the transparent switching between configurations. Using a neural network architecture search that statically generates a range of implementations that target a resource-precision trade-off, we explore the detection of the optimal parameters for the required QoS under specific memory and energy constraints. For an accuracy loss of 1%, we demonstrate that a $$1.6\times$$
1.6
×
higher frame processing rate can be achieved on GPU with further improvements possible at further relaxed accuracy. In order to further improve the switching between configurations, we enhance the proposed mechanism by employing central processing units (CPUs) for offloading some of the executed frames, which helps to improve the frame rate by further 0.9%.