“…However, with the introduction of datasets with a larger vocabulary of objects and predicates [6,23], visual phrase approaches have been facing severe difficulties as most relations have very few training examples. Compositional methods [9,11,17,27,30,33,42], which allow sharing knowledge across triplets, have scaled better but do not cope well with unseen relations. To increase the expressiveness of the generic compositional detectors, recent works have developed models of statistical dependencies between the subject, object and predicate, using, for example, graphical models [7,24], language distillation [45], or semantic context [48].…”