Relative Frequency as a Determinant of Phonetic Change

Kent, Roland G.; Zipf, George Kingsley

doi:10.2307/408772

Cited by 7 publications

(10 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…It will also be important to investigate how the proposed indexing structure can be used by modern ranking algorithms. The author assumes that based on Zipf's law [6], our test text collection is sufficient and acceptable for evaluating search performance. Nevertheless, to investigate ranking algorithms' behavior we plan to use collections, such as TREC GOV and GOV2, which are intended to analyze search quality.…”

Section: Discussionmentioning

confidence: 99%

“…We assume that in typical texts, words are distributed similarly, in accordance with Zipf's law [6]. Therefore, the results obtained with our text collection will be relevant to other collections.…”

Section: Search Experiments Environmentmentioning

confidence: 98%

“…(to, be, or): (0, 1, 2), (0, 5, 6), (4, 1, 2), and (4,5,6). Only for the first component of the key is the intermediate posting list ordered in increasing order.…”

Section: Intermediate Posting List Data Orderingmentioning

confidence: 99%

“…Words appear in texts at different frequencies. The typical word frequency distribution is described by Zipf's law [6]. An example of words' occurrence distribution is shown in Fig.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Proximity Full-Text Search by Means of Additional Indexes with Multi-component Keys: In Pursuit of Optimal Performance

Veretennikov

2019

Communications in Computer and Information Science

View full text Add to dashboard Cite

Full-text search engines are important tools for information retrieval. In a proximity full-text search, a document is relevant if it contains query terms near each other, especially if the query terms are frequently occurring words. For each word in a text, we use additional indexes to store information about nearby words that are at distances from the given word of less than or equal to the MaxDistance parameter. We showed that additional indexes with threecomponent keys can be used to improve the average query execution time by up to 94.7 times if the queries consist of high-frequency occurring words. In this paper, we present a new search algorithm with even more performance gains. We consider several strategies for selecting multi-component key indexes for a specific query and compare these strategies with the optimal strategy. We also present the results of search experiments, which show that three-component key indexes enable much faster searches in comparison with two-component key indexes.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Search Experiments Environmentmentioning

confidence: 98%

“…(to, be, or): (0, 1, 2), (0, 5, 6), (4, 1, 2), and (4,5,6). Only for the first component of the key is the intermediate posting list ordered in increasing order.…”

Section: Intermediate Posting List Data Orderingmentioning

confidence: 99%

“…Words appear in texts at different frequencies. The typical word frequency distribution is described by Zipf's law [6]. An example of words' occurrence distribution is shown in Fig.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Proximity Full-Text Search by Means of Additional Indexes with Multi-component Keys: In Pursuit of Optimal Performance

Veretennikov

2019

Communications in Computer and Information Science

View full text Add to dashboard Cite

show abstract

“…We showed that three component indexes can be created for relatively large values of MaxDistance (i.e., 5,7,9).…”

mentioning

confidence: 99%

An efficient algorithm for three-component key index construction

Veretennikov¹

2019

Vestn. Udmurt. Univ. Mat. Mekh. Komp’yut. Nauki

View full text Add to dashboard Cite

In this paper, proximity full-text searches in large text arrays are considered. A search query consists of several words. The search result is a list of documents containing these words. In a modern search system, documents that contain search query words that are near each other are more relevant than documents that do not share this trait. To solve this task, for each word in each indexed document, we need to store a record in the index. In this case, the query search time is proportional to the number of occurrences of the queried words in the indexed documents. Consequently, it is common for search systems to evaluate queries that contain frequently occurring words much more slowly than queries that contain less frequently occurring, ordinary words. For each word in the text, we use additional indexes to store information about nearby words at distances from the given word of less than or equal to MaxDistance, which is a parameter. This parameter can take a value of 5, 7, or even more. Threecomponent key indexes can be created for faster query execution. Previously, we presented the results of experiments showing that when queries contain very frequently occurring words, the average time of the query execution with three-component key indexes is 94.7 times less than that required when using ordinary inverted indexes. In the current work, we describe a new three-component key index building algorithm and demonstrate the correctness of the algorithm. We present the results of experiments creating such an index that is dependent on the value of MaxDistance.In this paper, we continue our research [1]. In the development of modern methods of full-text search, documents that contain queried words near each other are considered more important and relevant [1][2][3][4]. The importance of taking proximity information into account in the calculation of relevance increases for larger text collections [3]. At the same time, we need to guarantee that the search time is limited by reasonable boundaries. However, for large text collections, the probability of performance problems related to the search time increases.Inverted indexes are used for the implementation of the full-text search [5][6][7][8]. To take into account the distance between words in the text, we need to store in the index information about every occurrence of every word of every indexed text. Words occur in texts with different frequencies.A typical word frequency distribution in texts [9] (Zipf's law) is presented in Fig. 1. The horizontal axis is used to represent words, with high-frequently occurring words on the left side to low-frequently occurring words on the right side. On the vertical axis, we plot the total number of occurrences in the texts of each word. 1) A.ID < B.ID or 2) A.ID = B.ID and A.P < B.P.Among the performance improvement methods, the following methods can be considered: 1) Early-termination methods [13,14] are based on a special sorting of the postings in the index, in order of decreasing the relevance of the posting. At some poin...

show abstract

Mathematical Analysis With Applications

2020

Springer Proceedings in Mathematics &Amp; Statistics

View full text Add to dashboard Cite

Full-text search engines are important tools for information retrieval. In a proximity full-text search, a document is relevant if it contains query terms near each other, especially if the query terms are frequently occurring words. For each word in the text, we use additional indexes to store information about nearby words at distances from the given word of less than or equal to MaxDistance, which is a parameter. A search algorithm for the case when the query consists of high-frequently used words is discussed. In addition, we present results of experiments with different values of MaxDistance to evaluate the search speed dependence on the value of MaxDistance. These results show that the average time of the query execution with our indexes is 94.7-45.9 times (depending on the value of MaxDistance) less than that with standard inverted files when queries that contain high-frequently occurring words are evaluated.

show abstract

Relative Frequency as a Determinant of Phonetic Change

Cited by 7 publications

References 0 publications

Proximity Full-Text Search by Means of Additional Indexes with Multi-component Keys: In Pursuit of Optimal Performance

Proximity Full-Text Search by Means of Additional Indexes with Multi-component Keys: In Pursuit of Optimal Performance

An efficient algorithm for three-component key index construction

Mathematical Analysis With Applications

Contact Info

Product

Resources

About