Continuum beliefs and attitudes towards people with mental illness: Results from a national survey in France

Scientific facilities such as the Advanced Light Source (ALS) and Joint Genome Institute and projects such as the Materials Project have an increasing need to capture, store, and analyze dynamic semi-structured data and metadata. A similar growth of semi-structured data within large Internet service providers has led to the creation of NoSQL data stores for scalable indexing and MapReduce for scalable parallel analysis. MapReduce and NoSQL stores have been applied to scientific data. Hadoop, the most popular open source implementation of MapReduce, has been evaluated, utilized and modified for addressing the needs of different scientific analysis problems. ALS and the Materials Project are using MongoDB, a document oriented NoSQL store. However, there is a limited understanding of the performance trade-offs of using these two technologies together. In this paper we evaluate the performance, scalability and fault-tolerance of using MongoDB with Hadoop, towards the goal of identifying the right software environment for scientific data analysis.

show abstract

Exploiting Lustre File Joining for Effective Collective IO

Vetter

Canon

et al. 2007

View full text Add to dashboard Cite

Lustre is a parallel file system that presents high aggregated IO bandwidth by striping file extents across many storage devices. However, our experiments indicate excessively wide striping can cause performance degradation. Lustre supports an innovative file joining feature that joins files in place. To mitigate striping overhead and benefit collective IO, we propose two techniques: split writing and hierarchical striping. In split writing, a file is created as separate subfiles, each of which is striped to only a few storage devices. They are joined as a single file at the file close time. Hierarchical striping builds on top of split writing and orchestrates the span of subfiles in a hierarchical manner to avoid overlapping and achieve the appropriate coverage of storage devices. Together, these techniques can avoid the overhead associated with large stripe width, while still being able to combine bandwidth available from many storage devices. We have prototyped these techniques in the ROMIO implementation of MPI-IO. Experimental results indicate that split writing and hierarchical striping can significantly improve the performance of Lustre collective IO in terms of both data transfer and management operations. On a Lustre file system configured with 46 object storage targets, our implementation improves collective write performance of a 16-process job by as much as 220%. R. Shane Canon

show abstract

3H(p→,γ)4
Canon
¹
,
Nelson
²
,
Sabourov
³

et al. 2002
Phys. Rev. C
23
3
29
2
View full text Add to dashboard Cite

Evaluating Hadoop for Data-Intensive Scientific Operations

Fadika

Govindaraju

Canon

et al. 2012

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

R. S. Canon

Gamma-Ray Production in a Storage Ring Free-Electron Laser

Performance evaluation of a MongoDB and hadoop platform for scientific data analysis

Exploiting Lustre File Joining for Effective Collective IO

3H(p→,γ)4
Canon
¹
,
Nelson
²
,
Sabourov
³

et al. 2002
Phys. Rev. C
23
3
29
2
View full text Add to dashboard Cite

Evaluating Hadoop for Data-Intensive Scientific Operations

Contact Info

Product

Resources

About

R. S. Canon

Gamma-Ray Production in a Storage Ring Free-Electron Laser

Performance evaluation of a MongoDB and hadoop platform for scientific data analysis

Exploiting Lustre File Joining for Effective Collective IO

3H(p→,γ)4Canon1, Nelson2, Sabourov3 et al. 2002Phys. Rev. C233292View full textAdd to dashboardCite

Evaluating Hadoop for Data-Intensive Scientific Operations

Contact Info

Product

Resources

About

3H(p→,γ)4
Canon
¹
,
Nelson
²
,
Sabourov
³

et al. 2002
Phys. Rev. C
23
3
29
2
View full text Add to dashboard Cite