Milovan Duric scite author profile

Milovan Duric

4Publications

9Citation Statements Received

26Citation Statements Given

How they've been cited

How they cite others

Affiliations

FC Barcelona, Barcelona Supercomputing Center

Publications

Order By: Most citations

Evaluation of vectorization potential of Graph500 on Intel's Xeon Phi

Stanić

Palomar

Ratkovic

et al. 2014

View full text Add to dashboard Cite

Graph500 is a data intensive application for high performance computing and it is an increasingly important workload because graphs are a core part of most analytic applications. So far there is no work that examines if Graph500 is suitable for vectorization mostly due a lack of vector memory instructions for irregular memory accesses. The Xeon Phi is a massively parallel processor recently released by Intel with new features such as a wide 512-bit vector unit and vector scatter/gather instructions. Thus, the Xeon Phi allows for more efficient parallelization of Graph500 that is combined with vectorization. In this paper we vectorize Graph500 and analyze the impact of vectorization and prefetching on the Xeon Phi. We also show that the combination of parallelization, vectorization and prefetching yields a speedup of 27% over a parallel version with prefetching that does not leverage the vector capabilities of the Xeon Phi.The research leading to these results has received funding from the\ud European Research Council under the European Unions 7th FP (FP/2007-\ud 2013) / ERC GA n. 321253. It has been partially funded by the Spanish\ud Government (TIN2012-34557)Peer ReviewedPostprint (published version

show abstract

Joint Circuit-System Design Space Exploration of Multiplier Unit Structure for Energy-Efficient Vector Processors

Ratkovic

Palomar

Stanić

et al. 2015

View full text Add to dashboard Cite

VALib and SimpleVector

Stanić

Palomar

Ratkovic

et al. 2014

View full text Add to dashboard Cite

Vector architectures have been traditionally applied to the supercomputing domain with many successful incarnations. The energy efficiency and high performance of vector processors, as well as their applicability in other emerging domains, encourage pursuing further research on vector architectures. However, there is a lack of appropriate tools to perform this research. This paper presents two tools for measuring and analyzing an application's suitability for vector microarchitectures. The first tool is VALib, a library that enables hand-crafted vectorization of applications and its main purpose is to collect data for detailed instruction level characterization and to generate input traces for the second tool. The second tool is SimpleVector, a fast tracedriven simulator that is used to estimate the execution time of a vectorized application on a candidate vector microarchitecture. The potential of the tools is demonstrated using six applications from emerging application domains such as speech and face recognition, video encoding, bioinformatics, machine learning and graph search. The results indicate that 63.2% to 91.1% of these contemporary applications are vectorizable. Then, over multiple use cases, we demonstrate that the tools can facilitate rapid evaluation of various vector architecture designs.

show abstract

Dynamic-vector execution on a general purpose EDGE chip multiprocessor

Duric

Palomar

Smith

et al. 2014

View full text Add to dashboard Cite

This paper proposes a cost-effective technique that morphs the available cores of a low power chip multiprocessor (CMP) into an accelerator for data parallel (DLP) workloads. Instead of adding a special-purpose vector architecture as an accelerator, our technique leverages the resources of each CMP core to mimic the functionality of a vector processor. The morphing provides dynamic vector execution (DVX) on a general purpose CMP, by adding minimal hardware for vector control. DVX enhances the vector execution by dynamically configuring the allocation of compute and memory resources to match particular workload requirements. As an energy efficient substrate, we utilize modest dual issue cores based on an Explicit Data Graph Execution (EDGE) architecture. The results show that a DVX enabled 4-core EDGE CMP improves the energy-delay product over 14x, at the cost of only 1.1% of additional area. We compare DVX against a CMP that adds a dedicated DLP accelerator based on a conventional high performance vector design. The vector accelerator increases the area footprint over 74%, which greatly affects the cost of the modest processor. DVX avoids the additional costs and yet gains over 86% of the speedup obtained with the dedicated accelerator.This work has been partially funded by the Spanish Government (TIN2012-34557), the European Research Council under the European Unions 7th FP (FP/2007-2013) / ERC GA n. 321253. and Microsoft ResearchPeer ReviewedPostprint (published version

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Milovan Duric

Evaluation of vectorization potential of Graph500 on Intel's Xeon Phi

Joint Circuit-System Design Space Exploration of Multiplier Unit Structure for Energy-Efficient Vector Processors

VALib and SimpleVector

Dynamic-vector execution on a general purpose EDGE chip multiprocessor

Contact Info

Product

Resources

About