Search citation statements
Paper Sections
Citation Types
Year Published
Publication Types
Relationship
Authors
Journals
Modelling the distributional semantics of such a morphologically rich language as Arabic needs to take into account its introflexive, fusional, and inflectional nature attributes that make up its combinatorial sequences and substitutional paradigms. To evaluate such word distributional models, the benchmarks that have been used thus far in Arabic have mimicked those in English. This paper reports on a benchmark that we designed to reflect linguistic patterns in both Contemporary Arabic and Classical Arabic, the first being a cover term for written and spoken Modern Standard Arabic, while the second for pre-modern Arabic. The analogy items we included in this benchmark are chosen in a transparent manner such that they would capture the major features of nouns and verbs; derivational and inflectional morphology; high-, middle-, and low-frequency patterns and lexical items; and morphosemantic, morphosyntactic, and semantic dimensions of the language. All categories included in this benchmark are carefully selected to ensure proper representation of the language. The benchmark consists of 45 roots of the trilateral, all-consonantal, and semivowel-inclusive types; six morphosemantic patterns (’af‘ala; ifta‘ala; infa‘ala; istaf‘ala; tafa‘‘ala; and tafā‘ala); five derivations (the verbal noun, active participle, and the contrasts in Masculine-Feminine; Feminine-Singular-Plural; Masculine-Singular-Plural); and morphosyntactic transformations (perfect and imperfect verbs conjugated for all pronouns); and lexical semantics (synonyms, antonyms, and hyponyms of nouns, verbs, and adjectives), as well as capital cities and currencies. All categories include an equal proportion of high-, medium-, and low-frequency items. For the purpose of validating the proposed benchmark, we developed a set of embedding models from different textual sources. Then, we tested them intrinsically using the proposed benchmark and extrinsically using two natural language processing tasks: Arabic Named Entity Recognition and Text Classification. The evaluation leads to the conclusion that the proposed benchmark is truly reflective of this morphologically rich language and discriminatory of word embeddings.
Modelling the distributional semantics of such a morphologically rich language as Arabic needs to take into account its introflexive, fusional, and inflectional nature attributes that make up its combinatorial sequences and substitutional paradigms. To evaluate such word distributional models, the benchmarks that have been used thus far in Arabic have mimicked those in English. This paper reports on a benchmark that we designed to reflect linguistic patterns in both Contemporary Arabic and Classical Arabic, the first being a cover term for written and spoken Modern Standard Arabic, while the second for pre-modern Arabic. The analogy items we included in this benchmark are chosen in a transparent manner such that they would capture the major features of nouns and verbs; derivational and inflectional morphology; high-, middle-, and low-frequency patterns and lexical items; and morphosemantic, morphosyntactic, and semantic dimensions of the language. All categories included in this benchmark are carefully selected to ensure proper representation of the language. The benchmark consists of 45 roots of the trilateral, all-consonantal, and semivowel-inclusive types; six morphosemantic patterns (’af‘ala; ifta‘ala; infa‘ala; istaf‘ala; tafa‘‘ala; and tafā‘ala); five derivations (the verbal noun, active participle, and the contrasts in Masculine-Feminine; Feminine-Singular-Plural; Masculine-Singular-Plural); and morphosyntactic transformations (perfect and imperfect verbs conjugated for all pronouns); and lexical semantics (synonyms, antonyms, and hyponyms of nouns, verbs, and adjectives), as well as capital cities and currencies. All categories include an equal proportion of high-, medium-, and low-frequency items. For the purpose of validating the proposed benchmark, we developed a set of embedding models from different textual sources. Then, we tested them intrinsically using the proposed benchmark and extrinsically using two natural language processing tasks: Arabic Named Entity Recognition and Text Classification. The evaluation leads to the conclusion that the proposed benchmark is truly reflective of this morphologically rich language and discriminatory of word embeddings.
Poetry is the jewel in the crown of our country’s classical culture and has been praised and studied by countless people for thousands of years. In recent years, with the rapid development of computer technology and the leap-forward improvement of hardware computing power, natural language processing (NLP) technology has achieved remarkable results in practice. We applied NLP to the text analysis of classical poetry, proposed a set of methods to automatically recognize the artistic conception in classical poetry, and established the classical poetry artistic conception dataset for experimentation through the crawler method. In the experiment, we studied the application of different machine learning algorithms in text classification, combined such algorithms with different document vectorization methods, compared their performance on the topic classification problem of poetry, and concluded that there are some better accuracy rates under the classical machine learning framework. Comparing the effects of word-based vectors and word-based vectors, we concluded that the ancient poetry word vectors constructed based on characters have a higher accuracy rate. We also further introduced deep learning methods into the research, analyzed the pros and cons of various neural networks, and studied the neural network architectures that have good results in the practice of NLP, such as TextCNN and BiLSTM models. We also introduced mature NLP pre-training models such as BERT to classify the artistic conception of classical poetry. In addition, we also constructed an emotional dictionary matching method based on word vectors for sentiment analysis. The experimental results have shown that the method proposed in this paper has a good effect of automatic recognition of classical poetry mood, which can be used to recommend similar poems and select poems with emotion as the theme through the poetry mood.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.