Low lattice thermal conductivity is essential for high thermoelectric performance of a material. Lattice thermal conductivity is often computed based on density functional theory calculations, but such calculations carry a high computational cost and machine learning is therefore increasingly being used to estimate lattice thermal conductivity at a much lower computational expense. With the ability to asses larger sets of materials, machine learning could offer an effective procedure to identify low lattice thermal conductivity compounds. However, such compounds can be quite rare and distinct from typical compounds in a given training set. This can be problematic as standard machine learning methods lack the ability to precisely interpret properties of compounds with features differing significantly from those in the training set. By computing the lattice thermal conductivity of 122 half-Heusler compounds using the temperature dependent effective potential method, we generate a data set sufficient in span to explore this issue. We show that random forest regression can fail to identify low lattice thermal conductivity compounds with random selection of training data. However, if the choice of training data is instead guided using feature and principal component analysis, it can drastically improve the ability to identify low lattice thermal conductivity compounds as well as model performance.