“…Therefore the second-best path is more likely to be one of those paths that share as many as possible labels with the best path, e.g., B-ROLE,I-ROLE,E-ROLE,O,S-ORG, rather than the actual target label sequence at level 4, i.e., B-PER,I-PER,I-PER,I-PER,E-PER, which does not overlap with the best path at all. In addition, Shibuya and Hovy (2020) reuse the same potential function at all higher levels. This indicates that, for instance, at level 3 and time step 1, their model encourages the dot product of the hidden state and the label embedding h 1 v B-ROLE to be larger than h 1 v B-PER , while at level 4, the remaining influence of the best path reversely forces h 1 v B-PER to be larger than h 1 v B-ROLE .…”