Summary
To address the challenging needs of high‐performance big data processing, parallel‐distributed frameworks such as Hadoop are being utilized extensively. However, in heterogeneous environments, the performance of Hadoop clusters is below par. This is primarily because the blocks of the clusters are allocated equally to all nodes without regard to differences in the capability of individual nodes. This results in reduced data locality. Thus, a new data‐placement scheme that enhances data locality is required for Hadoop in heterogeneous environments. This article proposes a new data placement scheme that preserves the same degree of data locality in heterogeneous environments as that of the standard Hadoop, with only a small amount of replicated data. In the proposed scheme, only those blocks with the highest probability of being accessed remotely are selected and replicated. The results of experiments conducted indicate that the proposed scheme incurs only a 20% disk space overhead and has virtually the same data locality ratio as the standard Hadoop, which has a replication factor of three and 200% disk space overhead.
In this paper, we proposed a scalable RDF triple store for massive-scale RDF data that processes the SPARQL query with many join operations in efficient manner. Graph characteristic of RDF data model hinders scalable and efficient indexing and querying over RDF triples. To address the problem, our query processing uses the pruning algorithm based on Bitstructure and summarized information to minimize data-reading. Our approach guarantees scalability and flexibility even for massive-scale RDF data by storing RDF triples in distributed fashion, providing the modifiable structure, and optimizing memory footprint of usage. The experiments shows that our system is better performing for queries with many join operations while uses less memory footprints.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.