In this paper, we describe the construction of the 14-million-word Nepali National Corpus (NNC). This corpus includes both spoken and written data, the latter incorporating a Nepali match for FLOB and a broader collection of text. Additional resources within the NNC include parallel data (English–Nepali and Nepali–English) and a speech corpus. The NNC is encoded as Unicode text and marked up in CES-compatible XML. The whole corpus is also annotated with part-of-speech tags. We describe the process of devising a tagset and retraining tagger software for the Nepali language, for which there were no existing corpus resources. Finally, we explore some present and future applications of the corpus, including lexicography, NLP, and grammatical research.
Language is used for communication and communication facilitates social activities. If we want to capture this, linguistic investigation has to be carried out within a wider context. Examination of linguistic communication in a wider context shows that it is multimodal. In order to study naturalistic multimodal communication using a corpus, the corpus should contain a combination of recordings, documentation, and transcription of multimodal communication from different social activities in naturalistic settings, preserving unedited conversation. This paper presents a brief account of the principles, methodology, current status, and preliminary findings, based on an incrementally growing and multimodal activity based spoken language corpus of Nepali.
Nepal has 123 languages within six families of spoken languages and a sign language. She has federal administrative structure and three levels of government. There is no majority language at national level. Nepali is the only majority language at province level with majority in 4 among the 7 provinces, and 21 majority languages at local level. The distribution of languages in terms of mother tongue speakers varies considerably among the different levels – national, province and local – as well as among the different units of the same level – among the provinces and among the local levels. According to the provision in the prevalent constitution, one or more majority language(s) spoken as mother tongue in a province can be additional official language of the province provided by the particular province through province law. This paper looks at the language data at different levels and concludes that the province is not the appropriate unit for use of additional official language in terms of cost effectiveness and inclusiveness, instead local level is the appropriate unit. Thus it suggests to consider local level as the unit of implementation and include the languages above 25% mother tongue speakers in the local level for the additional official language.
This paper studies multimodality in Own Communication Management (OCM) focusing on how linguistic communication involves gestures in order to manage communication. OCM is a basic function in face-to-face communication and concerns how a speaker, on the basis of feedback needs to be able to plan his or her contributions and to modify earlier content or expressions. Thus OCM has two major functions namely "choice" and "change" both of which are realized with OCM related expressions and operations. This paper reports on studies of the expressions, and operations in both of the OCM functions and their distribution patterns. It also reports on interaction between OCM expressions, and between OCM operation and other communicative functions (Interactive Communication Management (ICM) and main message (MM)). Some of the main findings from the study are that about 66% of all OCM expressions involve gestures, and that the distribution of choice and change function of OCM is about 90% to 10%. The OCM expressions have multiple functions and interact with other communicative functions including ICM and the main message resulting in a complex system.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.