We propose a server-based approach to manage a general-purpose graphics processing unit (GPU) in a predictable and efficient manner. Our proposed approach introduces a GPU server that is a dedicated task to handle GPU requests from other tasks on their behalf. The GPU server ensures bounded time to access the GPU, and allows other tasks to suspend during their GPU computation to save CPU cycles. By doing so, we address the two major limitations of the existing real-time synchronization-based GPU management approach: busy waiting within critical sections and long priority inversion. We have implemented a prototype of the server-based approach on a real embedded platform. This case study demonstrates the practicality and effectiveness of the server-based approach. Experimental results indicate that the server-based approach yields significant improvements in task schedulability over the existing synchronization-based approach in most practical settings. Although we focus on a GPU in this paper, the server-based approach can also be used for other types of computational accelerators. arXiv:1709.06613v2 [cs.DC] 11 May 2018 unit (GPU), which can greatly help in addressing the timing challenges of computation-intensive tasks by accelerating their execution.The use of GPUs in a time predictable manner brings up several challenges. First, many of today's commercial-off-the-shelf (COTS) GPUs do not support a preemption mechanism, and GPU access requests from application tasks are handled in a sequential, non-preemptive manner. This is primarily due to the high overhead expected on GPU context switching [39]. Although some recent GPU architectures, such as NVIDIA Pascal [2], claim to offer GPU preemption, there is no documentation regarding their explicit behavior, and existing drivers (and GPU programming APIs) do not offer any programmer control over GPU preemption at the time of writing this paper.Second, COTS GPU device drivers do not respect task priorities and the scheduling policy used in the system. Hence, in the worst case, the GPU access request of the highest-priority task may be delayed by the requests of all lower-priority tasks in the system, which could possibly cause unbounded priority inversion.The aforementioned issues have motivated the development of predictable GPU management techniques to ensure task timing constraints while achieving performance improvement [15,16,17,20,21,27,41]. Among them, the work in [15,16,17] introduces a synchronization-based approach that models GPUs as mutually-exclusive resources and uses real-time synchronization protocols to arbitrate GPU access. This approach has many benefits. First, it can schedule GPU requests from tasks in an analyzable manner, without making any change to GPU device drivers. Second, it allows the existing task schedulability analysis methods, originally developed for real-time synchronization protocols, to be easily applied to analyze tasks accessing GPUs. However, due to the underlying assumption on critical sections, this approach requires tasks t...