Proceedings of the 23rd International Conference on Parallel Architectures and Compilation 2014
DOI: 10.1145/2628071.2628108
|View full text |Cite
|
Sign up to set email alerts
|

Versatile and scalable parallel histogram construction

Abstract: Histograms are used in various fields to quickly profile the distribution of a large amount of data. However, it is challenging to efficiently utilize abundant parallel resources in modern processors for histogram construction. To make matters worse, the most efficient implementation varies depending on input parameters (e.g., input distribution, number of bins, and data type) or architecture parameters (e.g., cache capacity and SIMD width).This paper presents versatile histogram methods that achieve competiti… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2015
2015
2024
2024

Publication Types

Select...
4
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 7 publications
(5 citation statements)
references
References 27 publications
0
5
0
Order By: Relevance
“…It is crucial to design an ecient binning approach on GPUs. Such an approach usually needs carefully designed histogram and scan kernels, such as those proposed in previous research [25,47]. We adopt a simple and ecient "histogram-scanbin" strategy generating the bins across GPU memory hierarchy.…”
Section: Methodology 31 Adaptive Gpu Segsort Mechanismmentioning
confidence: 99%
“…It is crucial to design an ecient binning approach on GPUs. Such an approach usually needs carefully designed histogram and scan kernels, such as those proposed in previous research [25,47]. We adopt a simple and ecient "histogram-scanbin" strategy generating the bins across GPU memory hierarchy.…”
Section: Methodology 31 Adaptive Gpu Segsort Mechanismmentioning
confidence: 99%
“…We show end-to-end speedups, ranging from 1.4× on CC and 2× for BFS, to 2.7× on PR, and analyze Milk's performance on synthesized random graphs with variable working set size and temporal and spatial locality. Figure 10 shows speedups when vertices range from V=2 21 to V=2 25 , with an average degree of 16 in uniform degree distribution (i.e., Uniform 21..25 ). We show two BFS variants -BFSd from Graph500 [21] (in Figure 5), as well as BFSp from GAPBS (Figure 8a).…”
Section: Speedupmentioning
confidence: 99%
“…In Figure 11, we compare performance while varying vertex degrees to show sufficient spatial locality can be exploited even at a low average degree. We use the Count-Degree (Histogram) kernel of Graph500, shown in Figure 6, which is used to construct a graph data structure from a list of edges in graph applications, and is critical in other applications [25]. Figure 11 shows speedups on Histogram in uniform distributions with average degrees from 1 to 64.…”
Section: Speedupmentioning
confidence: 99%
“…For instance, Jung et al [32] propose parallel histogram implementations using both atomic operations and privatization. These codes process a set of input values, and produce a histogram with a given number of bins.…”
Section: Software Techniquesmentioning
confidence: 99%
“…For example, consider a loop that processes a set of input values (e.g., image pixels) and produces a histogram of these values with a given number of bins. In this case, the reduction variable is the whole histogram array, and the reduction phase can dominate execution time [32], as shown in Fig. 2.…”
Section: Separate Update-and Read-only Phasesmentioning
confidence: 99%