Genome-Wide Association Studies (GWASs), analyses that try to find a link between a given phenotype (such as a disease) and genetic markers, have been growing in popularity in the recent years. Relations between phenotypes and genotypes are not easy to identify, as most of the phenotypes are a product of the interaction between multiple genes, a phenomenon known as epistasis. Many authors have resorted to different approaches and hardware architectures in order to mitigate the exponential time complexity of the problem. However, these studies make some compromises in order to keep a reasonable execution time, such as limiting the number of genetic markers involved in the interaction, or discarding some of these markers in an initial filtering stage. This work presents MPI3SNP, a tool that implements a three-way exhaustive search for cluster architectures with the aim of mitigating the exponential growth of the run-time. Modern cluster solutions usually incorporate GPUs. Thus, MPI3SNP includes implementations for both multi-CPU and multi-GPU clusters. To contextualize the performance achieved, MPI3SNP is able to analyze an input of 6300 genetic markers and 3200 samples in less than 6 min using 768 CPU cores or 4 min using 8 NVIDIA K80 GPUs. The source code is available at https://github.com/chponte/mpi3snp .
The growing complexity in computer system hierarchies due to the increase in the number of cores per processor, levels of cache (some of them shared) and the number of processors per node, as well as the high-speed interconnects, demands the use of new optimization techniques and libraries that take advantage of their features.In this paper Servet, a suite of benchmarks focused on detecting a set of parameters with high influence in the overall performance of multicore systems, is presented. These benchmarks are able to detect the cache hierarchy, including their size and which caches are shared by each core, bandwidths and bottlenecks in memory accesses, as well as communication latencies among cores. These parameters can be used by autotuned codes to increase their performance in multicore clusters. Experimental results using different representative systems show that Servet provides very accurate estimates of the parameters of the machine architecture.
To efficiently scale dense linear algebra problems to future exascale systems, communication cost must be avoided or overlapped. Communication-avoiding 2.5D algorithms improve scalability by reducing inter-processor data transfer volume at the cost of extra memory usage. Communication overlap attempts to hide messaging latency by pipelining messages and overlapping with computational work. We study the interaction and compatibility of these two techniques for two matrix multiplication algorithms (Cannon and SUMMA), triangular solve, and Cholesky factorization. For each algorithm, we construct a detailed performance model which considers both critical path dependencies and idle time. We give novel implementations of 2.5D algorithms with overlap for each of these problems. Our software employs UPC, a partitioned global address space (PGAS) language that provides fast one-sided communication. We show communication avoidance and overlap provide a cumulative benefit as core counts scale, including results using over 24K cores of a Cray XE6 system.
Source code in C ++ and MPI running on Linux systems as well as a reference manual are available at http://msaprobs.sourceforge.net CONTACT: jgonzalezd@udc.esSupplementary information: Supplementary data are available at Bioinformatics online.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.