Proceedings of the 2013 International Symposium on Memory Management 2013
DOI: 10.1145/2491894.2466485
|View full text |Cite
|
Sign up to set email alerts
|

A bloat-aware design for big data applications

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
28
0

Year Published

2015
2015
2020
2020

Publication Types

Select...
5
2
1

Relationship

2
6

Authors

Journals

citations
Cited by 39 publications
(29 citation statements)
references
References 33 publications
1
28
0
Order By: Relevance
“…The raw dataset is a text file with each line containing one data point. Hence the first UDF is a map function, which extracts the data points and store them into a set of DenseVector objects and (lines [12][13][14][15][16]). An additional LabeledPoint object is created for each data point to package its feature vector and label value together.…”
Section: Motivating Examplementioning
confidence: 99%
See 1 more Smart Citation
“…The raw dataset is a text file with each line containing one data point. Hence the first UDF is a map function, which extracts the data points and store them into a set of DenseVector objects and (lines [12][13][14][15][16]). An additional LabeledPoint object is created for each data point to package its feature vector and label value together.…”
Section: Motivating Examplementioning
confidence: 99%
“…Based on this order, the size-type of each UDT is determined by its field that has the highest variability (lines [12][13][14][15][16][17][18][19][20]. Furthermore, each field's final sizetype is determined by the type with the highest variability in its type-set.…”
Section: Local Classification Analysismentioning
confidence: 99%
“…This means that data are accessed at the binary level rather than as objects. This approach was used since creating and collecting language objects is often a cause of performance bottlenecks …”
Section: Preliminaries—apache Asterixdbmentioning
confidence: 99%
“…Comprehensive studies across many contemporary Big Data systems [18] confirm that these overheads lead to significantly reduced scalability-e.g., applications crash with OutOfMemoryError, although the size of the processed dataset is much smaller than the heap size-as well as exceedingly high memory management costs-e.g., the GC time accounts for up to 50% of the overall execution time. Despite the many optimizations [6, 7, 16, 19, 21, 23-25, 33, 38, 41, 45, 48, 49, 52, 54-57, 60, 61, 72, 73, 76] from various research communities, poor performance inherent with the managed runtime remains a serious problem that can devaluate these domain-specific optimization techniques.…”
Section: Motivationmentioning
confidence: 99%