Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming 2008
DOI: 10.1145/1345206.1345262
|View full text |Cite
|
Sign up to set email alerts
|

Semantics-based distributed I/O for mpiBLAST

Abstract: BLAST is a widely used software toolkit for genomic sequence search. mpiBLAST is a freely available, open-source parallelization of BLAST that uses database segmentation to allow different worker processes to search (in parallel) unique segments of the database. After searching, the workers write their output to a filesystem. While mpiBLAST has been shown to achieve high performance in clusters with fast local filesystems, its I/O processing remains a concern for scalability, especially in systems having limit… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0

Year Published

2009
2009
2011
2011

Publication Types

Select...
3
2

Relationship

0
5

Authors

Journals

citations
Cited by 6 publications
(5 citation statements)
references
References 3 publications
0
5
0
Order By: Relevance
“…The most time-consuming step in all orthology prediction algorithms is the generation of the Blast all-versus-all searches for each new update. In spite of the efforts aimed at developing faster parallelized Blast methods [ 40 , 41 ], the Blast all-versus-all computational requirements grow quadratically with the addition of new proteomes. Therefore, one of our future goals will be to develop an incremental update process, minimizing the number of distance calculations required between the thousands of sequences present in the previous version of the database.…”
Section: Discussionmentioning
confidence: 99%
“…The most time-consuming step in all orthology prediction algorithms is the generation of the Blast all-versus-all searches for each new update. In spite of the efforts aimed at developing faster parallelized Blast methods [ 40 , 41 ], the Blast all-versus-all computational requirements grow quadratically with the addition of new proteomes. Therefore, one of our future goals will be to develop an incremental update process, minimizing the number of distance calculations required between the thousands of sequences present in the previous version of the database.…”
Section: Discussionmentioning
confidence: 99%
“…However, CloudBLAST approach is disconnected from the concept of scientific workflows and it is not concerned about capturing and analyzing provenance data. G-BLAST [30] framework implements a solution based on grid services for supporting the submission of mpiBLAST [31] jobs to cluster systems. In spite of dealing with BLAST parallel execution, G-BLAST adopts a different fragmentation schema from that used in our work: it chooses to fragment the database rather than input file.…”
Section: Related Workmentioning
confidence: 99%
“…In our previous work [7,8], we provided a detailed description of ParaMEDIC. Here we present a brief summary of that work.…”
Section: Overview Of Paramedic-enhanced Mpiblastmentioning
confidence: 99%
“…To resolve such issues on a global scale, we proposed a new, non‐traditional approach for distributed I/O known as ParaMEDIC (Parallel Metadata Environment for Distributed I/O and Computing) 6–8. ParaMEDIC uses application‐specific semantic information to process the data generated by treating it as a collection of high‐level abstract objects, rather than as a generic byte‐stream.…”
Section: Introductionmentioning
confidence: 99%