A Comparison of High-Level Programming Choices for Incomplete Sparse Factorization Across Different Architectures

Booth, Joshua Dennis; Kim, Kyungjoo; Rajamanickam, Sivasankaran

doi:10.1109/ipdpsw.2016.41

Cited by 3 publications

(9 citation statements)

References 29 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Incomplete levels. We will only consider ILU(k) with k = 0, though Javelin supports other levels as implemented by other work [5], [6], [15] and commonly used in iterative solvers. As k increases, additional fill-in is allowed into the sparsity pattern.…”

Section: Methodsmentioning

confidence: 99%

“…Therefore, comparing performance as k varies would not provide a deep understanding of the scalability of Javelin without knowing where and how much fill-in was produced. It is therefore common to compare scalability primarily with ILU(0) [3], [5] over a large test suite of matrices with different sparsity pattern and row density in order to estimate how well the implementation scales with fill-in as we do in this paper.…”

Section: Methodsmentioning

confidence: 99%

“…These factors include programming model, task size, and data-structure related to the targeted machine. This was first noted in the early 1990s [10], and recently examined on current many-core systems with different factorization codes in [5]. Recently, one version of ILU has been shown to achieve very good performance on many-core and GPU systems [3].…”

Section: Introductionmentioning

confidence: 93%

See 2 more Smart Citations

Javelin: A Scalable Implementation for Sparse Incomplete LU Factorization

Booth

Bolet

2019

2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

View full text Add to dashboard Cite

In this work, we present a new scalable incomplete LU factorization framework called Javelin to be used as a preconditioner for solving sparse linear systems with iterative methods. Javelin allows for improved parallel factorization on shared-memory many-core systems by packaging the coefficient matrix into a format that allows for high performance sparse matrix-vector multiplication and sparse triangular solves with minimal overheads. The framework achieves these goals by using a collection of traditional permutations, point-to-point thread synchronizations, tasking, and segmented prefix scans in a conventional compressed sparse row format. Moreover, this framework stresses the importance of co-designing dependent tasks, such as sparse factorization and triangular solves, on highly-threaded architectures. Using these changes, traditional fill-in and drop tolerance methods can be used, while still being able to have observed speedups of up to ∼ 42× on 68 Intel Knights Landing cores and ∼ 12× on 14 Intel Haswell cores.

show abstract

Section: Methodsmentioning

confidence: 99%

Section: Methodsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 93%

See 1 more Smart Citation

Javelin: A Scalable Implementation for Sparse Incomplete LU Factorization

Booth

Bolet

2019

2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

View full text Add to dashboard Cite

show abstract

“…Parallel programs should be optimised to extract maximum performance from hardware on architecture case by case [38], which is far from trivial according to Booth et al [39]. There exist different and combined manners to explore parallelism such as Data Parallelism and Task Parallelism [40].…”

Section: Introductionmentioning

confidence: 99%

Parallelism Strategies for Big Data Delayed Transfer Entropy Evaluation

2019

View full text Add to dashboard Cite

Generated and collected data have been rising with the popularization of technologies such as Internet of Things, social media, and smartphone, leading big data term creation. One class of big data hidden information is causality. Among the tools to infer causal relationships, there is Delay Transfer Entropy (DTE); however, it has a high demanding processing power. Many approaches were proposed to overcome DTE performance issues such as GPU and FPGA implementations. Our study compared different parallel strategies to calculate DTE from big data series using a heterogeneous Beowulf cluster. Task Parallelism was significantly faster in comparison to Data Parallelism. With big data trend in sight, these results may enable bigger datasets analysis or better statistical evidence.

show abstract

“…Parallel programs should be optimized to extract maximum performance from hardware on architecture case by case, which is far from trivial according to (Booth et al, 2016). There exist different and combined manners to explore parallelism such as data parallelism and task parallelism (Gordon et al, 2006).…”

Section: Introductionmentioning

confidence: 99%

Delayed Transfer Entropy applied to Big Data

Dourado¹

View full text Add to dashboard Cite

Dedico este trabalho a minha noiva, meus irmãos e pais que sempre estiveram ao meu lado. Amo todos: Beatriz, Amália, Breno, Sarah, Ângela e Marcos Dedico também aos meus Avós que muito me ensinaram sobre a vida. E a todos que eu gostaria de passar mais tempo junto. AGRADECIMENTOSAgradeço antes de mais nada a Deus.Agradeço meu amigo e orientador Carlos Dias Maciel por sempre estar disposto a ajudar.Agradeço aos meus colegas de laboratório . As discussões científicas do dia a dia contribuiram para o meu desenvolvimento científico.Aos meus amigos André Andrew Cunha, Bruno Brubru Begotti, Leandro Gaspar e Ivan Filgueiras que sempre incentivaram minha curiosidade.Por último e não menos importante, agradeço a todos os meus amigos de José Bonifácio pela amizade, fazendo de lá, a melhor cidade de todas! Não tentarei listarei todos pois tenho certeza que um ou outro ficaria de fora injustamente."They did not know it was impossible so they did it" Mark Twain ABSTRACT DOURADO, J. Delayed Transfer Entropy applied to Big Data. 2018. 97p.

show abstract

A Comparison of High-Level Programming Choices for Incomplete Sparse Factorization Across Different Architectures

Cited by 3 publications

References 29 publications

Javelin: A Scalable Implementation for Sparse Incomplete LU Factorization

Javelin: A Scalable Implementation for Sparse Incomplete LU Factorization

Parallelism Strategies for Big Data Delayed Transfer Entropy Evaluation

Delayed Transfer Entropy applied to Big Data

Contact Info

Product

Resources

About