In this paper, proximity full-text searches in large text arrays are considered. A search query consists of several words. The search result is a list of documents containing these words. In a modern search system, documents that contain search query words that are near each other are more relevant than documents that do not share this trait. To solve this task, for each word in each indexed document, we need to store a record in the index. In this case, the query search time is proportional to the number of occurrences of the queried words in the indexed documents. Consequently, it is common for search systems to evaluate queries that contain frequently occurring words much more slowly than queries that contain less frequently occurring, ordinary words. For each word in the text, we use additional indexes to store information about nearby words at distances from the given word of less than or equal to MaxDistance, which is a parameter. This parameter can take a value of 5, 7, or even more. Threecomponent key indexes can be created for faster query execution. Previously, we presented the results of experiments showing that when queries contain very frequently occurring words, the average time of the query execution with three-component key indexes is 94.7 times less than that required when using ordinary inverted indexes. In the current work, we describe a new three-component key index building algorithm and demonstrate the correctness of the algorithm. We present the results of experiments creating such an index that is dependent on the value of MaxDistance.In this paper, we continue our research [1]. In the development of modern methods of full-text search, documents that contain queried words near each other are considered more important and relevant [1][2][3][4]. The importance of taking proximity information into account in the calculation of relevance increases for larger text collections [3]. At the same time, we need to guarantee that the search time is limited by reasonable boundaries. However, for large text collections, the probability of performance problems related to the search time increases.Inverted indexes are used for the implementation of the full-text search [5][6][7][8]. To take into account the distance between words in the text, we need to store in the index information about every occurrence of every word of every indexed text. Words occur in texts with different frequencies.A typical word frequency distribution in texts [9] (Zipf's law) is presented in Fig. 1. The horizontal axis is used to represent words, with high-frequently occurring words on the left side to low-frequently occurring words on the right side. On the vertical axis, we plot the total number of occurrences in the texts of each word.
1) A.ID < B.ID or 2) A.ID = B.ID and A.P < B.P.Among the performance improvement methods, the following methods can be considered: 1) Early-termination methods [13,14] are based on a special sorting of the postings in the index, in order of decreasing the relevance of the posting. At some poin...