“…The third-order model we suggest is similar to the grandsibling model proposed by Sangati et al (2009) and Hayashi et al (2011). It defines the probability of generating a dependent D = dist, d, w, c, t as the product of the distancebased probability and the probabilities of generating each of its components (d, t, w, c, denoting dependency relation, POS-tag, word and capitalisation feature, respectively).…”