Documents on the contemporary Web are based especially on HTML formats and, therefore, it is rather difficult to retrieve hidden structured information from them using automated agents. The concept of Linked Data based primarily on RDF data triples seems to successfully solve this drawback. However, we cannot directly adopt the existing solutions from relational databases or XML technologies, because RDF triples are modelled as graph data and not relational or tree data. Despite the research effort in recent years, several questions in the area of Linked Data indexing and querying remain open, not only since the amount of Linked Data globally available significantly increases each year. This paper attempts to introduce advantages and disadvantages of the state-of-the-art solutions and discuss several issues related to our ongoing research effort-the proposal of an efficient querying framework over Linked Data. In particular, our goal is to focus on large amounts of distributed and highly dynamic data.
In the recent years a new type of NoSQL databases, called graph databases (GDBs), has gained significant popularity due to the increasing need of processing and storing data in the form of a graph. The objective of this paper is a research on possibilities and limitations of GDBs and conducting an experimental comparison of selected GDB implementations. For this purpose the requirements of a universal GDB benchmark have been formulated and an extensible benchmarking tool, called BlueBench, has been developed.
In this paper we focus on the problem of automatic inferring an XML schema for a given sample set of XML documents. We provide an overview and analysis of existing approaches and compare their key advantages. We conclude the text with a discussion of open issues and problems to be solved as well as their possible solutions.
The XML has undoubtedly become a standard for data representation and manipulation. But most of XML documents are still created without the respective description of their structure, i.e. an XML schema. Hence, in this paper we focus on the problem of automatic inferring of an XML schema for a given sample set of XML documents. Contrary to existing works, whose aim is to infer as concise schema as possible, we focus on inferring of a more realistic result, i.e. a schema that is closer to human-written ones and bears more precise information. For this purpose we extend and combine the existing verified techniques (such as ACO heuristics or MDL principle) with a set of heuristics exploiting semantics of element/attribute names, thesauri or statistical analysis of input data. Using a set of examples we show and discuss advantages of our proposal.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.