Light verbs pose an a challenge in linguistics because of its syntactic and semantic versatility and its unique distribution different from regular verbs with higher semantic content and selectional resrictions. Due to its light grammatical content, earlier natural language processing studies typically put light verbs in a stop word list and ignore them. Recently, however, classification and identification of light verbs and light verb construction have become a focus of study in computational linguistics, especially in the context of multi-word expression, information retrieval, disambiguation, and parsing. Past linguistic and computational studies on light verbs had very different foci. Linguistic studies tend to focus on the status of light verbs and its various selectional constraints. While NLP studies have focused on light verbs in the context of either a multi-word expression (MWE) or a construction to be identified, classified, or translated, trying to overcome the apparent poverty of semantic content of light verbs. There has been nearly no work attempting to bridge these two lines of research. This paper takes this challenge by proposing a corpus-bases study which classifies and captures syntactic-semantic difference among all light verbs. In this study, we first incorporate results from past linguistic studies to create annotated light verb corpora with syntactic-semantics features. We next adopt a statistic method for automatic identification of light verbs based on this annotated corpora. Our results show that a language resource based methodology optimally incorporating linguistic information can resolve challenges posed by light verbs in NLP.
Although Mandarin Chinese is shared by Chinese communities such as Mainland China, Taiwan, Hong Kong, and Singapore, linguistic differences are frequently found among regional uses, ranging from pronunciation, orthography, vocabulary, grammar, and discourse. Along with the increasingly recognized notion of "World Chineses" in recent years, the study of the regional variations has also become more linguistically, socially, and culturally significant. Such a study facilitates more efficient communication among speakers of different varieties, reflects the social and cultural differences of the Chinese speaking communities from a linguistic perspective, and contributes to the theoretical discussion of language variation and change. With specific examples of the linguistic features exhibited in Mainland China, Taiwan, Hong Kong, and Singapore Mandarin Chinese, this chapter is an overview of the current studies, methodologies, and motivations of variation.
When PRC was founded on mainland China and the KMT retreated to Taiwan in 1949, the relation between mainland China and Taiwan became a classical Cold War instance. Neither travel, visit, nor correspondences were allowed between the people until 1987, when government on both sides started to allow small number of Taiwan people with relatives in China to return to visit through a third location. Although the thawing eventually lead to frequent exchanges, direct travel links, and close commercial ties between Taiwan and mainland China today, 38 years of total isolation from each other did allow the language use to develop into different varieties, which have become a popular topic for mainly lexical studies (e.g., Xu, 1995;Zeng, 1995;Wang & Li, 1996). Grammatical difference of these two variants, however, was not well studied beyond anecdotal observation, partly because the near identity of their grammatical systems. This paper focuses on light verb variations in Mainland and Taiwan variants and finds that the light verbs of these two variants indeed show distributional tendencies. Light verbs are chosen for two reasons: first, they are semantically bleached hence more susceptible to changes and variations. Second, the classification of light verbs is a challenging topic in NLP. We hope our study will contribute to the study of light verbs in Chinese in general. The data adopted for this study was a comparable corpus extracted from Chinese Gigaword Corpus and manually annotated with contextual features that may contribute to light verb variations. A multivariate analysis was conducted to show that for each light verb there is at least one context where the two variants show differences in tendencies (usually the presence/absence of a tendency rather than contrasting tendencies) and can be differentiated. In addition, we carried out a K-Means clustering analysis for the variations and the results are consistent with the multivariate analysis, i.e. the light verbs in Mainland and Taiwan indeed have variations and the variations can be successfully differentiated.
Despite the increasing interest in emotion and sentiment analysis in Chinese text, the field lacks reliable, normative ratings of the emotional content and valence of Chinese emotion words. This paper reports the first large-scale survey of average language users' judgment of perceived emotion type (e.g., ANGER, HAPPINESS), emotional intensity, and valence (e.g., POSITIVE, NEGATIVE) of Chinese emotion words. The results of the survey reveal significant differences from previously proposed Chinese emotion lexicons, which mostly relied on a few researchers' judgment or automatic annotation. Furthermore, the current study also explores the issue of lexical variation across different Chinese varieties with a comparison of emotion word perception by Chinese speakers from three different areas (Mainland China, Hong Kong, and Singapore). The emotion lexicons constructed in the current study will serve as an important reference for future research on emotion and language, including (but not limited to) topics related to sentiment detection and analysis, perception of affective language, and cross-regional lexical and semantic variation in Chinese.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.