Enhancing deep neural networks with morphological information

Klemen, Matej; Krsnik, Luka; Robnik–Šikonja, Marko

doi:10.1017/s1351324922000080

Cited by 10 publications

(9 citation statements)

References 54 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Morphological feature embeddings Adding morphological features explicitly as input on NLP tasks has mixed effects, depending on the task and quality of features. Klemen et al (2022) show across several languages that the results on (monolingual) dependency parsing and named entity recognition improve on LSTM-based models when UD feature embeddings are added as input, while the performance on comment filtering is not affected. Manually annotated features yield better results than automatically added features.…”

Section: Related Workmentioning

confidence: 95%

Proceedings of the Second Workshop on NLP Applications to Field Linguistics

2023

View full text Add to dashboard Cite

show abstract

Section: Related Workmentioning

confidence: 95%

Proceedings of the Second Workshop on NLP Applications to Field Linguistics

2023

View full text Add to dashboard Cite

show abstract

“…Mining n-grams is the automatic extraction of frequent phrases (Del, Tättar, and Fishel 2018), such as multi-word terms and special phrases, from a corpus. First, we POS tag, parse (syntactic dependencies) (Klemen, Krsnik, and Robnik-Šikonja 2022), lemmatise and tokenise the whole corpus and then extract bigrams, trigrams and tetragrams, hereinafter referred to as n-grams (verbs, nouns, adverbs, adjectives, participles and prepositions) to subsequently take them as input into Word2Vec, in particular into the Skip-gram algorithm, which generates vectorised words of high dimensionality (Camacho-Collados and Pilehvar 2018) with more meaning (see Figure 9). The threshold for the n-grams will be high, so that high quality legal LSP words (especially with short-and longdistance dependencies), phrases are extracted.…”

Section: Pre-training Of the Corpus And N-grams Miningmentioning

confidence: 99%

Attention mechanism and skip-gram embedded phrases

Krimpas

Valavani

2023

View full text Add to dashboard Cite

This article examines common translation errors that occur in the translation of legal texts. In particular, it focuses on how German texts containing legal terminology are rendered into Modern Greek by the Google translation machine. Our case study is the Google-assisted translation of the original (German) version of the Constitution of the Federal Republic of Germany into Modern Greek. A training method is proposed for phrase extraction based on the occurrence frequency, which goes through the Skip-gram algorithm to be then integrated into the Self Attention Mechanism proposed by Vaswani et al. (2017) in order to minimise human effort and contribute to the development of a robust machine translation system for multi-word legal terms and special phrases. This Neural Machine Translation approach aims at developing vectorised phrases from large corpora and process them for translation. The research direction is to increase the in-domain training data set and enrich the vector dimension with more information for legal concepts (domain specific features).

show abstract

“…The effectiveness of Deep Learning in natural language processing (NLP) was studied in the work of Klemen M., Krsnik L., Robnik-Šikonja M. [30]. In their research, the authors argue that the use of Deep Learning in natural language processing (NLP) is more productive than other methods.…”

Section: Neural Network Modeling As a Tool For Analyzing Language Unitsmentioning

confidence: 99%

Analysis of the Theoretical Foundations of Neural Network Modeling of Language Unit Recognition

Dovhan¹

2023

Scientific Space: Integration of Traditional and Innovative Processes

View full text Add to dashboard Cite

In the context of linguistic science and the integration of the mathematical paradigm into humanitarian discourse, the analysis and processing of natural language (in particular, neural network modeling of language categories) is an important and urgent task. Research by a number of authors and a series of experiments show that artificial neural networks of various types and kinds with different parameterizations can significantly optimize linguistic research: accelerate, deepen, integrate into various scientific fields, etc. At the same time, the use of artificial neural networks in linguistics is an important area of work, as well as a powerful and productive tool for a number of relevant studies, which, however, requires careful analysis and development of implementation strategies. The purpose of the article is to analyze the features of neural network modeling of language units recognition as an effective method of cognition within the anthropocentric paradigm of research. The solution of such research tasks determines the logic of presentation of the studied material in the article: introduction, systematization of achievements in the theory and practice of modeling as a universal tool of scientific cognition in general and theoretical justification of neural network modeling in the context of linguistic paradigm. Methodology of the study is based of the method of analyzing scientific research was updated, which led to the search and analysis of scientific publications related to neural network modeling (in particular, language units). We analyzed more than 60 recent scientific studies and publications covering aspects of the problem under study, which we evaluated based on their relevance, methodological specificity, and scientific novelty. Thus, the research methods outlined in this article allowed us to conduct a thorough analysis of the state and prospects for the development of the theoretical foundations of neural network modeling of language unit recognition. The analysis of the latest scientific research and publications has made it possible to determine the role and place of artificial neural networks of various types and their specifications in the process of modeling language units. The above made it possible to identify the main trends in working with text data aimed at improving the quality of their processing, generation, etc. Results of the survey showed that one of the core tasks of modern linguistic science is to understand the language polysystem, the peculiarities of its structure and the nature of its functioning. In addition, the discursive nature of language practices in the context of data interconnection is also important for documenting language structures and verbal practices that are socially determined. That is why the use of neural networks as a tool for conducting local linguistic and integrated scientific research involving the mathematical paradigm is gaining popularity. Neural network modeling of language categories in this context is the basic basis for such research. Practical implications. The actualization of neural network modeling of linguistic categories is not only one of the means of studying linguistic polysystems, but also an objective criterion of checking the truth of linguistic knowledge. Value/originality. Neural network modeling of language unit recognition has a significant impact on the development, improvement, and evolution of modern linguistic research. The effectiveness of neural network modeling of language categories creates opportunities for a deeper understanding of the studied linguistic objects, phenomena and processes, encouraging linguists to develop various linguistic models that could solve practical linguistic problems (information retrieval, machine translation, natural language processing, knowledge extraction and localization from text, etc.).

show abstract

Enhancing deep neural networks with morphological information

Cited by 10 publications

References 54 publications

Proceedings of the Second Workshop on NLP Applications to Field Linguistics

Proceedings of the Second Workshop on NLP Applications to Field Linguistics

Attention mechanism and skip-gram embedded phrases

Analysis of the Theoretical Foundations of Neural Network Modeling of Language Unit Recognition

Contact Info

Product

Resources

About