Despite its widespread use in chemical discovery, approximate density functional theory (DFT) is poorly suited to many targets, such as those containing open-shell, 3d transition metals that can be expected to have strong multi-reference (MR) character. For discovery workflows to be predictive, we need automated, low-cost methods that can distinguish the regions of chemical space where DFT should be applied from those where it should not. We curate over 4,800 open-shell transition-metal complexes up to hundreds of atoms in size from prior high-throughput DFT studies and evaluate affordable, finite-temperature DFT evaluation of fractional occupation number (FON)-based MR diagnostics. We show that intuitive measures of strong correlation (i.e., the HOMO-LUMO gap) are not predictive of MR character as judged by FON-based diagnostics. Analysis of independently trained machine learning (ML) models to predict HOMO-LUMO gaps and FON-based diagnostics reveals differences in metal-and ligand-sensitivity of the two quantities. We use our trained ML models to rapidly evaluate MR character over a space of ca. 187,000 theoretical complexes, identifying large-scale trends in spin-state-dependent MR character and finding small HOMO-LUMO gap complexes while ensuring low MR character. File list (4) download file view on ChemRxiv MRML3_v12.pdf (2.13 MiB) download file view on ChemRxiv A_MRML3.png (241.62 KiB) download file view on ChemRxiv SI_MRML3_v8.pdf (7.74 MiB) download file view on ChemRxiv SI_MRML3_Data-07242020.zip (93.84 MiB)