Big Holes in Big Data: A Monte Carlo Algorithm for Detecting Large Hyper-Rectangles in High Dimensional Data

Lemley, Joseph; Jagodzinski, Filip; Andonie, Răzvan

doi:10.1109/compsac.2016.73

Cited by 15 publications

(17 citation statements)

References 22 publications

(29 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Analysis with contours might help understanding whether observed smaller H 1 features are just sampling noise or indicate actual puncturing of the underlying topology of the dynamics. Quantifying importance of holes is also interesting in relational databases where they indicate missing data values or non-allowed attribute combinations [10]. Figure 5 is a plot of the averages (point-wise means) of H 1 stable ranks with respect to the standard contour and distance and shift contours of Figure 2 for 200 simulations of the point processes.…”

Section: Iterated Function System (Ifs)mentioning

confidence: 99%

Metrics and Stabilization in One Parameter Persistence

Chachólski¹,

Riihimäki²

2020

SIAM J. Appl. Algebra Geometry

View full text Add to dashboard Cite

We propose the use of persistent homology in a supervised way. We believe homological persistence is fundamentally not about decomposition theorems but a central role is played by a choice of metrics. Choosing a pseudometric between persistent vector spaces leads to a model. Fitting this model is what we believe supervised homological persistence is. We develop theory behind constructing such models and we give evidence of the usefulness of this approach in concrete data analysis tasks.

show abstract

Section: Iterated Function System (Ifs)mentioning

confidence: 99%

Metrics and Stabilization in One Parameter Persistence

Chachólski¹,

Riihimäki²

2020

SIAM J. Appl. Algebra Geometry

View full text Add to dashboard Cite

show abstract

“…Using the same number of trials, RS generally yields better results than GS or more complicated hyperparameter optimization methods. Especially in higherdimensional spaces, the computation resources required by RS methods are significantly lower than for GS [31]. RS works best under the assumption that not all hyperparameters are equally important [11].…”

Section: Random Searchmentioning

confidence: 99%

“…To illustrate the efficiency of RS in high-dimensional spaces, we refer to the following real-world application. Using RS, we have introduced in [31] the first polynomial (in the size of the input and the number of dimensions) algorithm for finding maximal empty hyper-rectangles (holes) in data. All previous (deterministic) algorithms are exponential.…”

Section: Random Searchmentioning

confidence: 99%

Hyperparameter optimization in learning systems

Andonie

2019

J Membr Comput

View full text Add to dashboard Cite

While the training parameters of machine learning models are adapted during the training phase, the values of the hyperparameters (or meta-parameters) have to be specified before the learning phase. The goal is to find a set of hyperparameter values which gives us the best model for our data in a reasonable amount of time. We present an integrated view of methods used in hyperparameter optimization of learning systems, with an emphasis on computational complexity aspects. Our thesis is that we should solve a hyperparameter optimization problem using a combination of techniques for: optimization, search space and training time reduction. Case studies from real-world applications illustrate the practical aspects. We create the framework for a future separation between parameters and hyperparameters in adaptive P systems.

show abstract

“…Our approach to finding large, axis-aligned, EHRs is taken from Lemley, et al [8]. The method starts from a randomly chosen point and expands from there.…”

Section: Approachmentioning

confidence: 99%

“Boxing Clever”: Practical Techniques for Gaining Insights into Training Data and Monitoring Distribution Shift

Ashmore

Hill

2018

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Training data has a significant influence on the behaviour of an artificial intelligence algorithm developed using machine learning techniques. Consequently, any argument that the trained algorithm is, in some way, fit for purpose ought to include consideration of data as an entity in its own right. We describe some simple techniques that can provide domain experts and algorithm developers with insights into training data and which can be implemented without specialist computer hardware. Specifically, we consider sampling density, test case generation and monitoring for distribution shift. The techniques are illustrated using example data sets from the University of California, Irvine, Machine Learning repository.

show abstract

Big Holes in Big Data: A Monte Carlo Algorithm for Detecting Large Hyper-Rectangles in High Dimensional Data

Cited by 15 publications

References 22 publications

Metrics and Stabilization in One Parameter Persistence

Metrics and Stabilization in One Parameter Persistence

Hyperparameter optimization in learning systems

“Boxing Clever”: Practical Techniques for Gaining Insights into Training Data and Monitoring Distribution Shift

Contact Info

Product

Resources

About