Abstract-Big Data are becoming a new technology focus both in science and in industry. This paper discusses the challenges that are imposed by Big Data on the modern and future Scientific Data Infrastructure (SDI). The paper discusses a nature and definition of Big Data that include such features as Volume, Velocity, Variety, Value and Veracity. The paper refers to different scientific communities to define requirements on data management, access control and security. The paper introduces the Scientific Data Lifecycle Management (SDLM) model that includes all the major stages and reflects specifics in data management in modern e-Science. The paper proposes the SDI generic architecture model that provides a basis for building interoperable data or project centric SDI using modern technologies and best practices. The paper explains how the proposed models SDLM and SDI can be naturally implemented using modern cloud based infrastructure services provisioning model and suggests the major infrastructure components for Big Data Infrastructure.
We present the results of the "Cosmogrid" cosmological N-body simulation suites based on the concordance LCDM model. The Cosmogrid simulation was performed in a 30 Mpc box with 2048 3 particles. The mass of each particle is 1.28 × 10 5 M , which is sufficient to resolve ultra-faint dwarfs. We found that the halo mass function shows good agreement with the Sheth & Tormen fitting function down to ∼10 7 M . We have analyzed the spherically averaged density profiles of the three most massive halos which are of galaxy group size and contain at least 170 million particles. The slopes of these density profiles become shallower than −1 at the innermost radius. We also find a clear correlation of halo concentration with mass. The mass dependence of the concentration parameter cannot be expressed by a single power law, however a simple model based on the Press-Schechter theory proposed by Navarro et al. gives reasonable agreement with this dependence. The spin parameter does not show a correlation with the halo mass. The probability distribution functions for both concentration and spin are well fitted by the log-normal distribution for halos with the masses larger than ∼10 8 M . The subhalo abundance depends on the halo mass. Galaxy-sized halos have 50% more subhalos than ∼10 11 M halos have.
The Distributed ASCI Supercomputer (DAS) is a homogeneous wide-area distributed system consisting of four cluster computers at different locations. DAS has been used for research on communication software, parallel languages and programming systems, schedulers, parallel applications, and distributed applications. The paper gives a preview of the most interesting research results obtained so far in the DAS project.
Abstract-This paper discusses the challenges that are imposed by Big Data Science on the modern and future Scientific Data Infrastructure (SDI). The paper refers to different scientific communities to define requirements on data management, access control and security. The paper introduces the Scientific Data Lifecycle Management (SDLM) model that includes all the major stages and reflects specifics in data management in modern e-Science. The paper proposes the SDI generic architecture model that provides a basis for building interoperable data or project centric SDI using modern technologies and best practices. The paper explains how the proposed models SDLM and SDI can be naturally implemented using modern cloud based infrastructure services provisioning model.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.