Real-time intramuscular electromyography (iEMG) decomposition, which is largely required in the neurological studies and applications, is a complex procedure that involves identifying the motor neuron spike trains from a streaming iEMG recording. We have previously proposed a sequential decomposition algorithm based on a Hidden Markov Model of EMG, that used Bayesian filter to estimate unknown parameters of motor units (MUs) spike trains, as well as their action potentials (MUAPs). In this paper we present a parallel computation implementation of this algorithm on Graphics Processing Unit (GPU), as well as a number of modifications applied to the original model in order to achieve a real-time performance of the algorithm. Specifically, the Kalman filter, previously used to estimate the MUAPs, is replaced by a least-mean-square filter. Additionally, we introduce a number of heuristics that help to omit the most improbable decomposition scenarios while searching for the best solution. Then, a GPU-implementation of the proposed algorithm is presented. Dozens of simulated iEMG signals containing up to 10 active MUs, as well as five experimental fine-wire iEMG signals acquired from tibialis anterior, were decomposed in real time. The accuracy of decompositions depended on the level of muscle activation, but in all cases exceeded 85%.