Karan Bhatia scite author profile

The exploitation of idle cycles on pervasive desktop PC systems offers the opportunity to increase the available computing power by orders of magnitude (10Â-1000Â). However, for desktop PC distributed computing to be widely accepted within the enterprise, the systems must achieve high levels of efficiency, robustness, security, scalability, manageability, unobtrusiveness, and openness/ ease of application integration.We describe the Entropia distributed computing system as a case study, detailing its internal architecture and philosophy in attacking these key problems. Key aspects of the Entropia system include the use of: (1) binary sandboxing technology for security and unobtrusiveness, (2) a layered architecture for efficiency, robustness, scalability and manageability, and (3) an open integration model to allow applications from many sources to be incorporated.Typical applications for the Entropia System includes molecular docking, sequence analysis, chemical structure modeling, ;and risk management. The applications come from a diverse set of domains including virtual screening for drug discovery, genomics for drug targeting, material property prediction, and portfolio management. In all cases, these applications scale to many thousands of nodes and have no dependences between tasks. We present representative performance results from several applications that illustrate the high performance, linear scaling, and overall capability presented by the Entropia system. r

show abstract

BioPig: a Hadoop-based analytic toolkit for large-scale sequence data

Nordberg

Bhatia

Wang

et al. 2013

100

View full text Add to dashboard Cite

We built BioPig on the Apache's Hadoop MapReduce system and the Pig data flow language. Compared with traditional serial and MPI-based algorithms, BioPig has three major advantages: first, BioPig's programmability greatly reduces development time for parallel bioinformatics applications; second, testing BioPig with up to 500 Gb sequences demonstrates that it scales automatically with size of data; and finally, BioPig can be ported without modification on many Hadoop infrastructures, as tested with Magellan system at National Energy Research Scientific Computing Center and the Amazon Elastic Compute Cloud. In summary, BioPig represents a novel program framework with the potential to greatly accelerate data-intensive bioinformatics analysis.

show abstract

Opal: SimpleWeb Services Wrappers for Scientific Applications

Krishnan

Steam

Bhatia

et al. 2006

View full text Add to dashboard Cite

An end-to-end Web services-based infrastructure for biomedical applications

Krishnan

Baldridge

Greenberg

et al. 2005

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Karan Bhatia

Entropia: architecture and performance of an enterprise desktop grid system

BioPig: a Hadoop-based analytic toolkit for large-scale sequence data

Opal: SimpleWeb Services Wrappers for Scientific Applications

An end-to-end Web services-based infrastructure for biomedical applications

Contact Info

Product

Resources

About