“…Let (c 2 , x 2 ) denote a MMLM instance that is in different language as (c 1 , x 1 ). Because the vocabulary, the position embedding, and special tokens are shared across languages, it is common to find anchor points (Pires et al, 2019;Dufter and Schütze, 2020) where x 1 = x 2 (such as subword, punctuation, and digit) or I(x 1 , x 2 ) is positive (i.e., the representations are associated or isomorphic). With the bridge effect of {x 1 , x 2 }, MMLM obtains a v-structure dependency "c 1 → {x 1 , x 2 } ← c 2 ", which leads to a negative co-information (i.e., interaction information) I(c 1 ; c 2 ; {x 1 , x 2 }) (Tsujishita, 1995).…”