JFlex is a lexical analyzer generator and takes input requirements with a set of regular expressions and corresponding actions. It creates a program (a lexer) that reads input, matches the input against the regular expressions, and runs the matching action. This paper shows how Programming Language can be developed. This work was done to develop a simple programming language with compilers using JFlex in NetBean so that it can support assignment statements, if then else, while do and type checking and its execution. The data type included are int, real, char, and Boolean/String. The key concept used in this work was the execution of the grammar or rules in the cup file. The parse tree records a sequence of rules the parser applies to recognize the input. The tool used for the development of lexical analyzers was JFlex. JFlex Lexers was based on deterministic finite automata (DFAs). To show the implementation and working of operators, a simple calculator was designed that supports addition and multiplication operations. Further, the key compiler concepts like lexical analyzers, semantic analysis, and parse trees are discussed. This paper will help understand the syntax and way to develop simple language. For the programming language developed, the evaluations of the expressions and statements are recursively done. Type checking and Error checking are also done where two operands are checked for their compatibility with the operator and are shown if incompatible expressions are found.
Large pre-trained transformer models using self-supervised learning have achieved state-of-the-art performances in various NLP tasks. However, for low-resource language like Nepali, pre-training of monolingual models remains a problem due to lack of training data and well-designed and balanced benchmark datasets. Furthermore, several multilingual pre-trained models such as mBERT and XLM-RoBERTa have been released, but their performance remains unknown for Nepali language. We compared Nepali monolingual pre-trained transformer models with multilingual models to determine their performance using a Nepali text classification dataset as a downstream task based on different number of classes and data sizes, taking machine learning (ML) and deep learning (DL) algorithms as baselines. Under-representation of Nepali language in mBERT resulted in overall poor performance, but, XLM-RoBERTa, which has a larger vocabulary size, produced state-of-the-art performance which is relatively similar to that of Nepali DistilBERT and DeBERTa, which outperformed all of the baseline algorithms. Bi-LSTM and SVM from the baselines also performed very well in variety of settings. Moreover, to assess the cross-language knowledge transfer for the cases when mono-lingual models are not available, we also evaluated HindiRoBERTa, a monolingual Indian language model on Nepali text dataset. This research mainly contributes to the Nepali NLP community by creation of news classification dataset with 20 classes, with over 200,000 articles and performance evaluation of various pre-trained monolingual Nepali transformers with multilingual transformers, DL and ML algorithms.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.