The explosion of the web and the abundance of linked data demand for effective and efficient methods for storage, management and querying. More specifically, the everincreasing size and number of RDF data collections raises the need for efficient query answering, and dictates the usage of distributed data management systems for effectively partitioning and querying them. To this direction, Apache Spark is one of the most active big-data approaches, with more and more systems adopting it, for efficient, distributed data management. The purpose of this paper is to provide an overview of the existing works dealing with efficient query answering, in the area of RDF data, using Apache Spark. We discuss on the characteristics and the key dimension of such systems, we describe novel ideas in the area, and the corresponding drawbacks, and provide directions for future work.
Abstract. Significant efforts have been dedicated recently to the development of architectures for storing and querying RDF data in distributed environments. Several approaches focus on data partitioning, which are able to answer queries efficiently, by using a small number of computational nodes. However, such approaches provide static data partitions. Given the increase on the continuous and rapid flow of data, nowadays there is a clear need to deal with streaming data. In this work, we propose a framework for incremental data partitioning by exploiting machine learning techniques. Specifically, we present a method to learn the structure of a partitioned database, and we employ two machine learning algorithms, namely Logistic Regression and Random Forest, to classify new streaming data.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.