Over the years, the objective of image and video compression has been to preserve perceived quality according to the Human Visual System (HVS) with minimal rate. Traditional encoders achieve this with the use of Rate-Distortion Optimization (RDO) techniques along with Image Quality Assessment (IQA) metrics that are correlated with human perception. Nowadays, a fast-growing number of applications fall within the realm of Video Coding for Machines (VCM), where the final recipient of compressed data is not a human but a machine performing a vision task. Recently, the lack of correlation between existing distortion measures and machine perception has been revealed, especially for RDO algorithms where distortion measures are computed on a local scale. In this paper, we propose a machine perception-aware metric designed to be incorporated into a standard-compliant Versatile Video Coding (VVC) encoder. Our proposed metric relies on a supervised training procedure as well as additional information available on the encoder side. In terms of correlation with machine perception, our metric significantly outperforms existing distortion measures in the literature.