И.Б. Мееров scite author profile

Рассматривается задача переупорядочения строк и столбцов симметричной положительно определенной разреженной матрицы с целью уменьшения числа ненулевых элементов в факторе Холецкого. Данная задача является NP-полной. Для ее решения используются эвристические алгоритмы, основанные на применении методов теории графов. Предлагается параллельный алгоритм переупорядочения для вычислительных систем с общей памятью. В качестве базы для распараллеливания используется модификация многоуровневого метода вложенных сечений, ранее реализованная авторами в виде библиотеки с открытым исходным кодом MORSy. Основная идея распараллеливания заключается в организации и параллельной обработке очереди задач, которые могут быть решены независимо. В отличие от широко распространенных аналогов, применяющих MPI для организации параллелизма как на распределенной, так и на общей памяти, предложенный алгоритм использует возможности стандарта OpenMP 3.0. Вычислительные эксперименты выполнены на симметричных положительно определенных матрицах из коллекции университета Флориды. Показано, что параллельный код MORSy дает сходные или лучшие перестановки в сравнении с библиотекой ParMETIS для всех тестовых матриц, кроме одной, в большинстве случаев опережая ParMETIS по времени работы. Программная реализация выполнена в виде библиотеки с открытым исходным кодом и доступна для скачивания на сайте Приволжского научно-образовательного центра суперкомпьютерных технологий. This paper deals with the NP-complete problem of finding a symmetric positive definite sparse matrix ordering that minimizes the Cholesky factor fill-in. For this purpose, heuristic approaches based on graph algorithms are applied. A new parallel ordering algorithm for shared-memory computing systems is proposed. The modified multilevel nested dissection algorithm from the recently presented MORSy library is used as a basis for ordering. The parallel processing is done in a task-based fashion. It uses the OpenMP 3.0 task parallelism relying on the dynamic load balancing implemented during the OpenMP runtime. The numerical experiments were performed using a number of symmetric positive definite matrices from the University of Florida Sparse Matrix Collection. The experimental results show the competitiveness of the proposed implementation on shared memory systems compared to the widely used ParMETIS library. In our experiments, the parallel MORSy version provides a better ordering than ParMETIS on all but one matrix in terms of the Cholesky factor fill-in and outperforms ParMETIS in most cases. The parallel MORSy version is publicly available from the Supercomputing Center of Lobachevsky State University of Nizhni Novgorod.

show abstract

Three-dimensional particle-in-cell plasma simulation on Intel Xeon Phi: performance optimization and case study

Мееров

Bastrakov

Surmin

et al. 2015

View full text Add to dashboard Cite

Рассматривается проблема эффективного использования ускорителей Xeon Phi при моделировании лазерной плазмы. Приводится анализ особенностей архитектуры Xeon Phi, влияющих на производительность кода при численном моделировании плазмы методом частиц в ячейках. Описывается параллельный программный комплекс PICADOR, оптимизированный ранее для расчетов на ускорителях. Производительность программного комплекса на Xeon Phi в сравнении с CPU исследуется при решении трех вычислительно трудоемких задач. Обсуждается соотношение времени расчета на Xeon Phi и CPU на разных этапах метода частиц в ячейках. Демонстрируется, что в зависимости от особенностей задачи Xeon Phi может как опережать, так и отставать от CPU при выполнении расчетов. An efficient application of computational systems equipped with Intel Xeon Phi coprocessors for the laser-plasma simulation is considered. The features of Xeon Phi architecture that influence the performance of Particle-in-Cell plasma simulation are analyzed. The PICADOR parallel plasma simulation code previously optimized for Xeon Phi is described. Its performance on Xeon Phi compared to CPU is studied on three computationally intensive plasma simulation problems. The ratio of computational time on Xeon Phi to CPU is discussed for the main stages of the Particle-in-Cell method. It is shown that, depending on the features of a physical problem, the use of Xeon Phi can be both advantageous and disadvantageous compared to CPU.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

И.Б. Мееров

New object detection features in the OpenCV library

A parallel multilevel nested dissection algorithm for shared-memory computing systems

Three-dimensional particle-in-cell plasma simulation on Intel Xeon Phi: performance optimization and case study

Contact Info

Product

Resources

About