Qiankun Miao scite author profile

Qiankun Miao

4Publications

9Citation Statements Received

71Citation Statements Given

How they've been cited

How they cite others

Affiliations

Shandong Institute of Automation, University of Science and Technology of China, Chinese Academy of Sciences

Publications

Order By: Most citations

Models of parallel computation: a survey and classification

Zhang

Chen

Sun

et al. 2007

Front. Comput. Sc. China

View full text Add to dashboard Cite

In this paper, the state-of-the-art parallel computational model research is reviewed. We will introduce various models that were developed during the past decades. According to their targeting architecture features, especially memory organization, we classify these parallel computational models into three generations. These models and their characteristics are discussed based on three generations classification. We believe that with the ever increasing speed gap between the CPU and memory systems, incorporating non-uniform memory hierarchy into computational models will become unavoidable. With the emergence of multi-core CPUs, the parallelism hierarchy of current computing platforms becomes more and more complicated. Describing this complicated parallelism hierarchy in future computational models becomes more and more important. A semi-automatic toolkit that can extract model parameters and their values on real computers can reduce the model analysis complexity, thus allowing more complicated models with more parameters to be adopted. Hierarchical memory and hierarchical parallelism will be two very important features that should be considered in future model design and research.Keywords parallel computational models, hierarchical memory, hierarchical parallelism, three generations, memory model BackgroundThe simplified and abstract description of a computer is called a "computational model". A computer architect, algorithm designer and program developer can use such a model as a basis to assess their work including the suitability of one computer architecture to various applications, the computation complexity of an algorithm and the potential performance of one program on various computers, etc. A good computational model can simplify the complicated work of the architect, algorithm designer and program developer while mapping their work effectively onto real computers. Thus, such computational model is sometimes also called "Bridging model" [1]. The bridging model between the sequential computer and algorithm designer/program developer is the Von Neumann and RAM (Random Access Machine) Model [2]. However, no commonly recognized bridging models are found between parallel computer and parallel programs, and no other model exists that can map a user's parallel program so smoothly onto parallel computers as the Von Neumann and RAM Model do. This situation is largely due to the immature parallel computer design, i.e., there are so many different architectures for parallel computers that change rapidly each year, and the greater demand on performance [3]; a clean and simplified description is almost impossible. However, the trend of parallel computer design is converging and a common parallel computer architecture model can be realized (such as cluster), and the communication (we have standard MPI interface) of parallel computing is not so interconnect network dependent, thus we have the BSP and LogP models [1,7].Based on the historical development of parallel computational models, we think they can be classified ...

show abstract

Single Data Copying for MPI Communication Optimization on Shared Memory System

Miao

Sun

Shan³

et al. 2007

View full text Add to dashboard Cite

Abstract. Shared memory system is an important platform for high performance computing. In traditional parallel programming, message passing interface (MPI) is widely used. But current implementation of MPI doesn't take full advantage of shared memory for communication.A double data copying method is used to copy data to and from system buffer for message passing. In this paper, we propose a novel method to design and implement the communication protocol for MPI on shared memory system. The double data copying method is replaced by a single data copying method, thus, message is transferred without the system buffer. We compare the new communication protocol with that in MPICH an implementation of MPI. Our performance measurements indicate that the new communication protocol outperforms MPICH with lower latency. For Point-to-Point communication, the new protocol performs up to about 15 times faster than MPICH, and it performs up to about 300 times faster than MPICH for collective communication.

show abstract

Parallelization and optimization of a CBVIR system on multi-core architectures

Miao

Chen²,

Li³

et al. 2009

View full text Add to dashboard Cite

Technique advances have made image capture and storage very convenient, which results in an explosion of the amount of visual information. It becomes difficult to find useful information from these tremendous data. Content-based Visual Information Retrieval (CBVIR) is emerging as one of the best solutions to this problem. Unfortunately, CBVIR is a very compute-intensive task. Nowadays, with the boom of multi-core processors, CBVIR can be accelerated by exploiting multi-core processing capability. In this paper, we propose a parallelization implementation of a CBVIR system facing to server application and use some serial and parallel optimization techniques to improve its performance on an 8-core and on a 16-core systems. Experimental results show that optimized implementation can achieve very fast retrieval on the two multicore systems. We also compare the performance of the application on the two multi-core systems and give an explanation of the performance difference between the two systems. Furthermore, we conduct detailed scalability and memory performance analysis to identify possible bottlenecks in the application. Based on these experimental results and performance analysis, we gain many insights into developing efficient applications on future multicore architectures.

show abstract

Parallelization and optimization of Mfold on shared memory system

Miao

Sun

Shan³

et al. 2010

Parallel Computing

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Qiankun Miao

Models of parallel computation: a survey and classification

Single Data Copying for MPI Communication Optimization on Shared Memory System

Parallelization and optimization of a CBVIR system on multi-core architectures

Parallelization and optimization of Mfold on shared memory system

Contact Info

Product

Resources

About