“…While the top-n metric is widely used for reaction prediction, its relevance has been questioned, 22 as many molecules can be built from more than one set of reactants, i.e., there are several "true" answers for a given product. Less common metrics in this context include fractional accuracy, 35 balanced accuracy, 26,55 weighted precision, 54 and ROC curve. 38,43 Metrics have also been used to assess the quality of ranking.…”