Increasingly, machine learning applications are being run on systems comprising lowpower CPUs driven by unreliable or intermittent power sources. In the event of a system failure, these applications typically have to be re-run from the beginning, which can waste both time and energy, as well as potentially compromising the training process in a ML algorithm. This paper proposes a model to allow an embedded operating system to auto-discover a suitable recovery and restart point so that a failed application can be restarted with minimal effect on its performance. The proposal encompasses a complete software stack that comprises a modified cross-compilation tool chain, a modified XV6 OS kernel, and a custom executable loader. The model exhibits time savings of over 50% in case where the application has passed the midpoint of its run, but is less effective if the failure point occurs earlier in the application’s run time.