Word formation by derivation is very productive in Assamese. A significant amount of words in Assamese owe their origin to derivation. The analysis in this paper takes into account the derivational processes related with lexical word categories, and, numerous bound morphemes that are used in the derivational process in the language. This analysis helps us to understand some of the important aspects of Assamese morphology. These aspects are - role of class maintaining and class changing morphemes, derivation of word from synonyms, productivity of derivational morphemes, morphophonemic changes in root as a result of affixation of derivational morphemes, presence of allomorphs of various bound morphemes, ability of a morpheme to derive words from different word categories. The significance of this papers lies in the fact that these word formation processes could help develop morphological rules that can be used for developing computational morphological tools like- stemmer, spell checker, tagger etc.
This article presents the work on the Part-of-Speech Tagger for Assamese based on Hidden Markov Model (HMM). Over the years, a lot of language processing tasks have been done for Western and South-Asian languages. However, very little work is done for Assamese language. So, with this point of view, the POS Tagger for Assamese using Stochastic Approach is being developed. Assamese is a free word-order, highly agglutinate and morphological rich language, thus developing POS Tagger with good accuracy will help in development of other NLP task for Assamese. For this work, an annotated corpus of 271,890 words with a BIS tagset consisting of 38 tag labels is used. The model is trained on 256,690 words and the remaining words are used in testing. The system obtained an accuracy of 89.21% and it is being compared with other existing stochastic models.
One of the goals for researchers of an endangered language is to help the indigenous group to revive and maintain the language that is at risk of disappearing. Sange Phiang, a native Bugun, told me on my first field trip to the Bugun area in West Kameng district of Arunachal Pradesh that “our language will disappear very soon”. Sange, a middle school teacher, fears that in 25 years there will be no Bugun speakers. This fear is not unfounded.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.