Proceedings of the 2017 ACM International Conference on Management of Data 2017
DOI: 10.1145/3035918.3035928
|View full text |Cite
|
Sign up to set email alerts
|

MacroBase

Abstract: As data volumes continue to rise, manual inspection is becoming increasingly untenable. In response, we present MacroBase, a data analytics engine that prioritizes end-user attention in high-volume fast data streams. MacroBase enables efficient, accurate, and modular analyses that highlight and aggregate important and unusual behavior, acting as a search engine for fast data. MacroBase is able to deliver order-of-magnitude speedups over alternatives by optimizing the combination of explanation and classificati… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
10
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 69 publications
(11 citation statements)
references
References 75 publications
0
10
0
Order By: Relevance
“…Sometimes, when Data X-Ray uses the instances generated by MLDebugger, it does better, at least in recall. 2 https://github.com/raonilourenco/MLDebugger This is expected for the case where the root causes are conjunctions of property-comparator-value triples since Data X-Ray was designed to find relevant conjunctions. That is, Data X-Ray produces a conjunction of property-value combinations that lead to bad scenarios.…”
Section: Resultsmentioning
confidence: 99%
See 3 more Smart Citations
“…Sometimes, when Data X-Ray uses the instances generated by MLDebugger, it does better, at least in recall. 2 https://github.com/raonilourenco/MLDebugger This is expected for the case where the root causes are conjunctions of property-comparator-value triples since Data X-Ray was designed to find relevant conjunctions. That is, Data X-Ray produces a conjunction of property-value combinations that lead to bad scenarios.…”
Section: Resultsmentioning
confidence: 99%
“…For purposes of reproducibility and community use, we will make our code and experiments available. 2…”
Section: Ii1 Estimator Random Forestmentioning
confidence: 99%
See 2 more Smart Citations
“…Consequently, analyzing a big data set all at once may require more than the available resources in order to meet specific application requirements [8], [9]. Random sampling is a common strategy to alleviate these challenges [10], e.g., in approximate and incremental computing [8], [11]- [15]. However, drawing random samples from big data is itself an expensive operation [16] especially with the shared-nothing architectures in the mainstream distributed computing frameworks for big data analysis.…”
Section: Introductionmentioning
confidence: 99%