ii
IntroductionThe interest in language resources and computational models for the study of similar languages, varieties and dialects has been growing substantially in the last few years. The first edition of the Workshop on Applying NLP tools to similar languages, varieties and dialects (VarDial) confirms the interest in the topic.Within the NLP community, the impact of language variation in the development of language resources and NLP applications has been explored in recent years with experiments in different directions. For example, automatic classification or identification of closely related languages such as in and ; corpus-driven studies focusing on lexical variation between varieties such as the one by Piersman et al. (2010) or Ljubešić and Fišer (2013); and finally, the adaptation of language models in the context of machine translation such as in Nakov and Tiedemann (2012).Together with the VarDial workshop we organized the Discriminating between Similar Languages (DSL) shared task. Discriminating between similar languages and language varieties is one of the bottlenecks of state-of-the-art language identification and it has been topic of a number of papers published in the last years. The DSL shared task provided a dataset to evaluate system's performance on discriminating 13 different languages in 6 groups of languages.The 18 papers that appear in this volume deal with different NLP tasks and applications such as parsing, morphological analysis, part-of-speech tagging, language identification and speech recognition. The VarDial workshop received 18 submissions and 12 of them are published in this volume. The DSL shared task received 22 inscriptions and 8 final submissions. Five system description papers plus the DSL shared task report appear in this volume.We take this opportunity to thank the VarDial program committee who thoroughly reviewed all papers; the DSL shared task participants for valuable feedback and discussions; and the COLING organizers for their support, specially Jennifer Foster who replied promptly to all our inquiries.
AbstractWhen PRC was founded on mainland China and the KMT retreated to Taiwan in 1949, the relation between mainland China and Taiwan became a classical Cold War instance. Neither travel, visit, nor correspondences were allowed between the people until 1987, when government on both sides started to allow small number of Taiwan people with relatives in China to return to visit through a third location. Although the thawing eventually lead to frequent exchanges, direct travel links, and close commercial ties between Taiwan and mainland China today, 38 years of total isolation from each other did allow the language use to develop into different varieties, which have become a popular topic for mainly lexical studies (e.g., Xu, 1995; Zeng, 1995; Wang & Li, 1996). Grammatical difference of these two variants, however, was not well studied beyond anecdotal observation, partly because the near identity of their grammatical systems. This paper focuses on light verb var...