New vehicle designs need to be tested in representative driving scenarios to evaluate their durability. Because these tests are costly, only a limited number of them can be performed. These have traditionally been selected using rules of thumb, which are not always applicable to modern vehicles. Hence, there is a need to ensure that vehicle tests are aligned with their real-world usage. One possibility for obtaining a broad real-world usage overview is to exploit the data collected by sensors embedded in production vehicles. But these do not produce the detailed data needed to derive the metrics computed using expensive sensors during testing. Therefore it is necessary to correlate the low-end sensor measurements available in production vehicles with the relevant metrics acquired using high-end sensors during testing. Machine learning is a promising avenue for doing this. The key challenge is that vehicles will be used "in the wild" in many scenarios that were not encountered in the controlled testing environment, and it is unlikely that learned models will perform reliably in these previously unseen environments. We overcome this challenge by allowing learned models to abstain from making a prediction when unexpected vehicle usage is identified. We propose a general framework that combines standard machine learning with novelty detection to identify previously unseen situations. We illustrate our framework's potential on data we collected from a large-scale roadroughness analysis use case. Empirically, our approach can identify novel road types in the wild and by doing so it yields better performance.