This paper describes a non-blocking checkpointing mode in support of optimistic parallel discrete event simulation. This mode allows real concurrency in the execution of state saving and other simulation specific operations (e.g. event list update, event execution), with the aim at removing the cost of recording state information from the completion time of the parallel simulation application. We present an implementation of a C library supporting non-blocking checkpointing on a myrinet based cluster, which demonstrates the practical viability of this checkpointing mode on standard off-the-shelf hardware. By the results of an empirical study on classical parameterized synthetic benchmarks we show that, except for the case of minimal state granularity applications, non-blocking checkpointing allows improvement of the speed of the parallel execution, as compared to commonly adopted, optimized checkpointing methods based on the classical blocking mode. A performance study for the case of a Personal Communication System (PCS) simulation is additionally reported to point out the benefits from non-blocking checkpointing for a real world application.