Given a set P of n coloured points on the real line, we study the problem of answering range α-majority (or "heavy hitter") queries on P. More specifically, for a query range Q, we want to return each colour that is assigned to more than an α-fraction of the points contained in Q. We present a new data structure for answering range α-majority queries on a dynamic set of points, where α ∈ (0, 1). Our data structure uses O(n) space, supports queries in O((lg n)/α) time, and updates in O((lg n)/α) amortized time. If the coordinates of the points are integers, then the query time can be improved to O(lg n/(α lg lg n)). For constant values of α, this improved query time matches an existing lower bound, for any data structure with polylogarithmic update time. We also generalize our data structure to handle sets of points in d-dimensions, for d ≥ 2, as well as dynamic arrays, in which each entry is a colour.
IntroductionMany problems in computational geometry deal with point sets that have information encoded as colours assigned to the points. In this paper, we design dynamic data structures for the range α-majority problem, in which we want to report colours that appear frequently within an axis-aligned query rectangle. This problem is useful in database applications in which we would like to know typical attributes of the data points in a query range [23,24]. For the one-dimensional case, where the points represent time stamps, this problem has data mining applications for network traffic logs, similar to those of coloured range counting (cf. [17]).Formally, we are given a set, P, of n points, where each point p ∈ P is assigned a colour c from a set, C, of colours. We denote the colour of p as col(p) = c. We are also given a fixed parameter α ∈ (0, 1), that defines the threshold for determining whether a colour is to be considered frequent. Our goal is to design a dynamic range α-majority data structure that can perform the following operations:-Query(Q): We are given an axis-aligned hyperrectangle Q as a query. Let P(Q) be the set {p | p ∈ P ∩Q}, and P(Q, c) be the set {p | p ∈ P(Q), col(p) = c}. The answer to the query Q is the set of colours C ⋆ such that for each colour c ∈ C ⋆ , |P(Q, c)| > α|P(Q)|, and for all c ∈ C ⋆ , |P(Q, c)| ≤ α|P(Q)|. We refer to a colour c ∈ C ⋆ as an α-majority for Q, and this type of query as an α-majority query. When α = 1/2, the problem is to identify the majority colour in Q, if such a colour exists. -Insert(p, c): Insert a point p with colour c into P. -Delete(p): Remove the point p from P.
Previous WorkStatic and Dynamic Range α-Majority: In all of the following results, unless mentioned otherwise, the threshold α ∈ (0, 1) is fixed at construction time, rather than specified for each query individually.