The article presents a classifying analysis of ancillary texts in Old East Slavonic hand-written copies of Parimejnik (Old Testament lectionary) from the 12 th – 14 th centuries. It aims to establish parameters for the analytical markup in Parimejnik subcorpus from the "Manuscript" historical corpus, to provide searching based on the fragments values, to demonstrate corresponding fragments in addition to the already existing aligned visualization of manuscripts based on temporal characteristics (dates, days of a week, and services), as well as paroemias and their components (headings and biblical verses). In the text-critical sense, the authors describe components of ancillary texts in various sections of the manuscripts (Nativity-Epiphany, Triodia, Synaxarion), their specificity, and mutual correlation. The variation of ancillary texts in different copies has been discovered, which complicates comparison and alignment in the parallel corpus. From the applied linguistics point of view, the authors propose a classification of the ancillary text components according to several criteria (function, method of execution, the presence of a biblical/hymnographic source, and the place in the sequence of rites). A system of their identifiers and fragmentation parameters into elementary components to establish links in a parallel corpus was also proposed. As a result, it will ensure the search and visualization of fragments of the same type in the corpus.
The work presents an experience of applying statistical methods to discovering thematically valuable words in three Old Russian (Old East Slavic) copies of the Ephrem theCitation: Baranov, V.A. (2018). Statistical analysis of the slavonic Paraenesis by Ephrem the Syrian (on three electronic copies of the 13-14 th centuries from the Manuscript corpus).
The work presents the results of the quantitative and statistical comparative analysis of the most frequent word forms and combinations of the Old Russian of the Panteleymon Gospel (RNB, Sof. 1). The work aims to reveal the degree of closeness of the Panteleymon Gospel to the other gospels and the medieval Slavonic texts of other genres, represented in sub-corpora of historical corpus "Manuscript: Slavic Written Heritage". The work was carried out with the help of the special modules of statistics and n-grams. The comparison of the lists of single-, two- and three-component linguistic units, automatically extracted from the manuscripts, with the respective lists of several sub-corpora points to the presence of the quantitative-statistical characteristics of the linguistic components of the manuscripts which can be recognized as important. The data of the three experiments are summarized. The first experiment showed that the smallest differences of the frequency lists exist between the Panteleymon Gospel and the sub-corpus of complete aprakoses and the greatest differences between the manuscript being analyzed and the sub-corpus of short aprakoses. This makes possible to recognize that the composition of the lists, the order and the relative frequency of the forms in them are the important characteristics of the manuscript or the sub-corpus. The application of the Weirdness measure helped to extract from the Panteleymon Gospel the word forms which are supposed to be significant – those, having the highest weight within the sub-corpora of different genres (вамъ, имъ, азъ, емоу, рече, аще). It has been established that the volume and composition of contrasted sub-corpus do not influence the result, and the use of the collections of complete and short aprakoses as contrast sub-corpora helped to specify the list of such forms (яко, къ, бо, о(т), имъ, есть, аще). The investigation of two- and three-component combinations, extracted with the help of the statistical measure T-score, gave the following results: a list of fixed combinations – invariable composition formulas (ев[ан](г)[елие] ѡ(т) ма[т](ѳ)[ея] etc.), inherent to all gospels, was made; entire grammatical structures (ѧже далъ ѥси etc.) were listed, as well as stable semantic complexes and their parts ([да] любите дроугъ дроуга etc.). Statistically important sequences having in the Panteleymon Gospel a statistical weight, which is considerably higher than in the contrast sub-corpora – нѣсте ли чьли, имать животъ вѣчьныи etc. have been revealed.
The article gives grounds for marking up machine-readable transcriptions of medieval Slavic manuscripts, which serve as textual material for the historical corpora, the excerpts are viewed as possessing valuable codicological or textological characteristics. Composition analysis of four manuscripts from the Slavonic Parimejnik (12 th – 14 th cc.) and modeling their structure with application of generally accepted tools of linguistic analysis have enabled solving the following tasks: analytical units identification, text format elaboration and search process algorithmization based on the natural language units characteristics. A suggested format of lectionaries description includes data on the section (sub-section) of the liturgical year, the number of the lectionary in this section, textual composition in relation to the texts of the Bible, the topic. The format of dates, days and time of lectionaries reading throughout the year includes data on the section (sub-section), the date of the fixed calendar or the week and the day within the section (sub-section) of the nonfixed calendar, the time of the church service, the event to which the service is devoted. The markup of the Parimejnik texts included indication of the excerpts boundaries and determination of their relations with the dictionary unitwith the aim to ensure the parallel corpus of four Slavic manuscripts of the Parimejnik with ranging the corresponding excerpts at the level of the lectionaries and Bible verses within the corpus "Manuscript" (manuscripts.ru).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.