2016 IEEE 40th Annual Computer Software and Applications Conference (COMPSAC) 2016
DOI: 10.1109/compsac.2016.73
|View full text |Cite
|
Sign up to set email alerts
|

Big Holes in Big Data: A Monte Carlo Algorithm for Detecting Large Hyper-Rectangles in High Dimensional Data

Abstract: Abstract-We present the first algorithm for finding holes in high dimensional data that runs in polynomial time with respect to the number of dimensions. Previous algorithms are exponential. Finding large empty rectangles or boxes in a set of points in 2D and 3D space has been well studied. Efficient algorithms exist to identify the empty regions in these lowdimensional spaces. Unfortunately such efficiency is lacking in higher dimensions where the problem has been shown to be NP-complete when the dimensions a… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
16
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
3
3
1

Relationship

1
6

Authors

Journals

citations
Cited by 15 publications
(17 citation statements)
references
References 22 publications
(29 reference statements)
0
16
0
Order By: Relevance
“…Analysis with contours might help understanding whether observed smaller H 1 features are just sampling noise or indicate actual puncturing of the underlying topology of the dynamics. Quantifying importance of holes is also interesting in relational databases where they indicate missing data values or non-allowed attribute combinations [10]. Figure 5 is a plot of the averages (point-wise means) of H 1 stable ranks with respect to the standard contour and distance and shift contours of Figure 2 for 200 simulations of the point processes.…”
Section: Iterated Function System (Ifs)mentioning
confidence: 99%
“…Analysis with contours might help understanding whether observed smaller H 1 features are just sampling noise or indicate actual puncturing of the underlying topology of the dynamics. Quantifying importance of holes is also interesting in relational databases where they indicate missing data values or non-allowed attribute combinations [10]. Figure 5 is a plot of the averages (point-wise means) of H 1 stable ranks with respect to the standard contour and distance and shift contours of Figure 2 for 200 simulations of the point processes.…”
Section: Iterated Function System (Ifs)mentioning
confidence: 99%
“…Using the same number of trials, RS generally yields better results than GS or more complicated hyperparameter optimization methods. Especially in higherdimensional spaces, the computation resources required by RS methods are significantly lower than for GS [31]. RS works best under the assumption that not all hyperparameters are equally important [11].…”
Section: Random Searchmentioning
confidence: 99%
“…To illustrate the efficiency of RS in high-dimensional spaces, we refer to the following real-world application. Using RS, we have introduced in [31] the first polynomial (in the size of the input and the number of dimensions) algorithm for finding maximal empty hyper-rectangles (holes) in data. All previous (deterministic) algorithms are exponential.…”
Section: Random Searchmentioning
confidence: 99%
“…Our approach to finding large, axis-aligned, EHRs is taken from Lemley, et al [8]. The method starts from a randomly chosen point and expands from there.…”
Section: Approachmentioning
confidence: 99%