Wiktionary, a satellite of the Wikipedia initiative, can be seen as a potential resource for Natural Language Processing. It requires however to be processed before being used efficiently as an NLP resource. After describing the relevant aspects of Wiktionary for our purposes, we focus on its structural properties. Then, we describe how we extracted synonymy networks from this resource. We provide an in-depth study of these synonymy networks and compare them to those extracted from traditional resources. Finally, we describe two methods for semiautomatically improving this network by adding missing relations: (i) using a kind of semantic proximity measure; (ii) using translation relations of Wiktionary itself. Note: The experiments of this paper are based on Wiktionary's dumps downloaded in year 2008. Differences may be observed with the current versions available online.
The morphological status of affixes in Chinese has long been a matter of debate. How one might apply the conventional criteria of free/bound and content/function features to distinguish word-forming affixes from bound roots in Chinese is still far from clear. Issues involving polysemy and diachronic dynamics further blur the boundaries. In this paper, we propose three quantitative features in a computational model of affixoid behavior in Mandarin Chinese. The results show that, except for in a very few cases, there are no clear criteria that can be used to identify an affix's status in an isolating language like Chinese. A diachronic check using contextualized embeddings with the WordNet Sense Inventory also demonstrates the possible role of the polysemy of lexical roots across diachronic settings.
In this paper we present an application fostering the integration and interoperability of computational lexicons, focusing on the particular case of mutual linking and cross-lingual enrichment of two wordnets, the ItalWordNet and Sinica BOW lexicons. This is intended as a case-study investigating the needs and requirements of semi-automatic integration and interoperability of lexical resources, in the view of developing a prototype web application to support the GlobalWordNet Grid initiative.
This paper investigates the most frequent lexical bundle (LB) ka li kong (to-you-say) (KLK), in an 18.5-hour Taiwanese Southern Min conversation corpus. The analysis focuses on the discourse-pragmatic functions of KLK, the role it plays in the speaker’s management of information in talk-in-interaction, and the collocations that are employed. The results show that the speaker utilizes KLK to imply epistemic authority regarding the veracity of the predication. Meanwhile, it expresses the speaker’s stance or functions as a discourse organizer to initiate a narrative that is newsworthy. Prosodically, it is always processed as a holistic chunk with great phonological reduction. Along with the low transitivity of the verb kong demonstrated by the type of object it takes, we argue that KLK is developing into a discourse marker. Collocation of KLK with the marker toh further triggers the grammaticalization of the four-word bundle toh ka li kong (TKLK) to encode an extreme stance.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.