Bipul Syam Purkayastha scite author profile

Bipul Syam Purkayastha

5Publications

32Citation Statements Received

19Citation Statements Given

How they've been cited

How they cite others

Affiliations

Assam University

Publications

Order By: Most citations

Part of Speech Tagging in Manipuri: A Rule based Approach

Singha¹,

Purkayastha²,

Singha³

2012

IJCA

View full text Add to dashboard Cite

The process of assigning morpho-syntactic categories of each morpheme including punctuation marks in a given text document according to the context is called Part of Speech (POS) tagging. In this paper we represent the rule-based Part of Speech Tagger of Manipuri by applying a set of hand written linguistic rules of Manipuri language. Nevertheless, it is very difficult to classify the lexical categories of Manipuri, an agglutinating Tibeto-Burman language of Northeast India. So, in this tagger we are using the affix stripping technique to segment the affixes from the root. As Manipuri has limited POS tagged corpus, the tagged output of this tagger will be very helpful to analyze Manipuri Part of speech by using many statistical models.

show abstract

Knowledge Based Approaches To Nepali Word Sense Disambiguation

Roy¹,

Sarkar²,

Purkayastha³

2014

IJNLC

View full text Add to dashboard Cite

A word may have multiple senses and the challenge is to find out which particular sense is appropriate in a given context. Word sense disambiguation(WSD) resolves this ambiguity by finding out which particular sense of a word is appropriate in a given context. WSD is of critical importance in the areas of machine translation, information retrieval, speech processing etc. In this paper we present some approaches to Word sense disambiguation in Nepali using Nepali WordNet. These approaches are overlap based approach and conceptual distance and semantic graph based approach which falls under Knowledge based approach. Conceptual distance and semantic graph distance are used as a measures to score our WSD algorithm.

show abstract

A Fuzzy Based Approach for Empirical Analysis of Unstructured Data

Goswami

Purkayastha

2020

j comput theor nanosci

View full text Add to dashboard Cite

Computational intelligence and soft computing has many promising technologies such as Text Mining. Document Classification using soft computing techniques like fuzzy logic helps to find a more practical solution due to ambiguity and uncertainty present in the text data. Uncertainty and information may be reflected as the part and parcel of any industrial or engineering problem to be solved. Information refers to the facts required to solve it and uncertainty refers to the non-random lack of certainty (‘non-random uncertainty’), ambiguity, haziness in the system. It is very important to ponder on the nature of uncertainty involved in a problem. Father of fuzzy logic, Lofti Zadeh (1965) suggested that decision-making using set membership is the key when it is required to deal with uncertainty. Fuzzy clustering helps to identify patterns which are difficult to be discovered using crisp clustering. Natural languages contain non-random uncertainty. To deal with non-random uncertainty or different degrees of truth or partial truth Fuzzy logic may be used. This work focuses on fuzzy logic based approaches being utilized for identification of coherent patterns. Empirical Analysis are conducted to realize and evaluate the effect of the methodology proposed.

show abstract

A Comparative Analysis of Particle Swarm Optimization and K-Means Algorithm for Text Clustering Using Nepali Wordnet

Sarkar¹,

Roy²,

Purkayastha³

2014

IJNLC

View full text Add to dashboard Cite

The volume of digitized text documents on the web have been increasing rapidly. As there is huge collection of data on the web there is a need for grouping(clustering) the documents into clusters for speedy information retrieval. Clustering of documents is collection of documents into groups such that the documents within each group are similar to each other and not to documents of other groups. Quality of clustering result depends greatly on the representation of text and the clustering algorithm. This paper presents a comparative analysis of three algorithms namely K-means, Particle swarm Optimization (PSO) and hybrid PSO+K-means algorithm for clustering of text documents using WordNet. The common way of representing a text document is bag of terms. The bag of terms representation is often unsatisfactory as it does not exploit the semantics. In this paper, texts are represented in terms of synsets corresponding to a word. Bag of terms data representation of text is thus enriched with synonyms from WordNet. K-means, Particle Swarm Optimization (PSO) and hybrid PSO+K-means algorithms are applied for clustering of text in Nepali language. Experimental evaluation is performed by using intra cluster similarity and inter cluster similarity. .

show abstract

Construction of English-Bodo Parallel Text Corpus for Statistical Machine Translation

Islam¹,

Paul²,

Purkayastha³

et al. 2018

IJNLC

View full text Add to dashboard Cite

Corpus is a large collection of homogeneous and authentic written texts (or speech) of a particular natural language which exists in machine readable form. The scope of the corpus is endless in Computational Linguistics and Natural Language Processing (NLP). Parallel corpus is a very useful resource for most of the applications of NLP, especially for Statistical Machine Translation (SMT). The SMT is the most popular approach of Machine Translation (MT) nowadays and it can produce high quality translationresult based on huge amount of aligned parallel text corpora in both the source and target languages. Although Bodo is a recognized natural language of India and co-official languages of Assam, still the machine readable information of Bodo language is very low. Therefore, to expand the computerized information of the language, English to Bodo SMT system has been developed. But this paper mainly focuses on building English-Bodo parallel text corpora to implement the English to Bodo SMT system using Phrase-Based SMT approach. We have designed an E-BPTC (English-Bodo Parallel Text Corpus) creator tool and have been constructed General and Newspaper domains English-Bodo parallel text corpora. Finally, the quality of the constructed parallel text corpora has been tested using two evaluation techniques in the SMT system.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.