Speech enhancement employing deep neural networks (DNNs) for denoising is called deep noise suppression (DNS). The DNS trained with mean squared error (MSE) losses cannot guarantee good perceptual quality. Perceptual evaluation of speech quality (PESQ) is a widely used metric for evaluating speech quality. However, the original PESQ algorithm is non-differentiable, therefore, cannot directly be used as optimization criterion for gradient-based learning. In this work, we propose an end-to-end non-intrusive PESQNet DNN to estimate the PESQ scores of the enhanced speech signal. Thus, by providing a reference-free perceptual loss, it serves as a mediator towards the DNS training, allowing to maximize the PESQ score of the enhanced speech signal. We illustrate the potential of our proposed PESQNet-mediated training on a strong baseline DNS. As further novelty, we propose to train the DNS and the PESQNet alternatingly to keep the PESQNet up-todate and perform well specifically for the DNS under training. Detailed analysis shows that the PESQNet mediation further increases the DNS performance by about 0.1 PESQ points on synthetic test data and by 0.03 DNSMOS points on real test data, compared to training with the MSE-based loss. Our proposed method outperforms the Interspeech 2021 DNS Challenge baseline by 0.2 PESQ points on synthetic test data and 0.1 DNSMOS points on real test data. Furthermore, it improves on the same DNS trained with an approximated differentiable PESQ loss by about 0.4 PESQ points on synthetic test data and 0.2 DNSMOS points on real test data.