Graph is a fundamental data structure that captures relationships between different data entities. In practice, graphs are widely used for modeling complicated data in different application domains such as social networks, protein networks, transportation networks, bibliographical networks, knowledge bases and many more. Currently, graphs with millions and billions of nodes and edges have become very common. In principle, graph analytics is an important big data discovery technique. Therefore, with the increasing abundance of large graphs, designing scalable systems for processing and analyzing large scale graphs has become one of the most timely problems facing the big data research community. In general, scalable processing of big graphs is B Sherif Sakr a challenging task due to their size and the inherent irregular structure of graph computations. Thus, in recent years, we have witnessed an unprecedented interest in building big graph processing systems that attempted to tackle these challenges. In this article, we provide a comprehensive survey over the state-of-the-art of large scale graph processing platforms. In addition, we present an extensive experimental study of five popular systems in this domain, namely, GraphChi, Apache Giraph, GPS, GraphLab and GraphX. In particular, we report and analyze the performance characteristics of these systems using five common graph processing algorithms and seven large graph datasets. Finally, we identify a set of the current open research challenges and discuss some promising directions for future research in the domain of large scale graph processing.
No abstract
With Data Science continuing to emerge as a powerful differentiator across industries, organisations are now focused on transforming their data into actionable insights. This task is challenging as in today's knowledge-, service-, and cloud-based economy, businesses accumulate massive amounts of raw data from a variety of sources. Data Lakes introduced as a storage repository to organize this raw data in its native format (supporting from relational to NoSQL DBs) until it is needed. The rationale behind a Data Lake is to store raw data and let the data analyst decide how to cook/curate them later. In this paper, we present the notion of Knowledge Lake, i.e. a contextualized Data Lake. The Knowledge Lake will provide the foundation for big data analytics by automatically curating the raw data in the Data Lake and to prepare them for deriving insights. We present CoreKG-an open source Data and Knowledge Lake service- which offers researchers and developers a single REST API to organize, curate, index and query their data and metadata in the Lake and over time. CoreKG manages multiple database technologies (from Relational to NoSQL) and offers a built-in design for data curation, security and provenance.
Abstract:The Tarom 1: 100,000 sheet is located within the Cenozoic Tarom-Hashtjin volcano-plutonic belt, NW Iran. Reconstruction of the tectonic and structural setting of the hydrothermal deposits is fundamental to predictive models of different ore deposits. Since fractal/multifractal modelling is an effective instrument for separation of geological and mineralized zones from background, therefore Concentration-Distance to Major Fault (C-DMF) fractal model and distribution of Cu anomalies were used to classify Cu mineralizations according to their distance to major faults. Application of the C-DMF model for the classification of Cu mineralization in the Tarom 1: 100,000 sheet reveals that the main copper mineralizations have a strong correlation with their distance to major faults in the area. The distances of known copper mineralizations having Cu values higher than 2.2 % to major faults are less than 10 km showing a positive correlation between Cu mineralization and tectonic events. Moreover, extreme and high Cu anomalies based on stream sediments and lithogeochemical data were identified by the Number-Size (N-S) fractal model. These anomalies have distances to major faults less than 10 km and validate the results derived via the C-DMF fractal model. The C-DMF fractal modelling can be utilized for the reconnaissance and prospecting of magmatic and hydrothermal deposits.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.