Abstract-The control of upper limb neuroprostheses through the peripheral nervous system (PNS) can allow restoring motor functions in amputees. At present, the important aspect of the real-time implementation of neural decoding algorithms on embedded systems has been often overlooked, notwithstanding the impact that limited hardware resources have on the efficiency/effectiveness of any given algorithm. Present study is addressing the optimization of a template matching based algorithm for PNS signals decoding that is a milestone for its real-time, full implementation onto a floating-point Digital Signal Processor (DSP).The proposed optimized real-time algorithm achieves up to 96% of correct classification on real PNS signals acquired through LIFE electrodes on animals, and can correctly sort spikes of a synthetic cortical dataset with sufficiently uncorrelated spike morphologies (93% average correct classification) comparably to the results obtained with top spike sorter (94% on average on the same dataset). The power consumption enables more than 24 hours processing at the maximum load, and latency model has been derived to enable a fair performance assessment. The final embodiment demonstrates the real-time performance onto a low-power off-the-shelf DSP, opening to experiments exploiting the efferent signals to control a motor neuroprosthesis.