JSON (JavaScript Object Notation) is a lightweight semi-structured data format based on the data types of programming language JavaScript. It is a popular data exchange format over the World Wide Web and becomes a dominant standard format for sending API (Application Programming Interface) requests and responses in the past few years. Furthermore, JSON has also attracted attentions of database community research, especially in data intensive applications. JSON is not only can be integrated in traditional database systems, but also widely used in NoSQL database systems and graph database systems. Compared with XML, JSON document is a set of “key-value” pairs, in which the “value” itself can be a JSON document, which allows arbitrary levels of nesting, so it is more flexible to use and more difficult to process accordingly. JSON data model and schema describe the basic data structures and semantics of the underlying JSON data, so it is the fundamental and key aspects for JSON data format. JSON data model and schema are not only foundations for other data management technologies, such as data indexing, data querying, data searching, data mapping, data integrating, and data mining, but also has important theoretical significance and application prospects to provide theoretical basis and technical means for other related research, such as data integration, data conversion and other semi-structured and unstructured data queries. This paper analyzes the key problems of JSON data model and schema, including what data model should adopted by JSON and the specification and schema outline of JSON model.
Abstract. We present a novel framework for indexing and searching schema-less XML documents based on concise summaries of their structural and textual content. Our search query language is XPath extended with full-text search. We introduce two novel data synopsis structures that correlate textual with positional information in an XML document and improves query precision. In addition, we present a two-phase containment filtering algorithm based on these synopses that improves the searching process. Our experimental evaluation shows that our data synopses indexing scheme outperforms the standard XML indexing scheme based on inverted lists; the query evaluation based on our data synopses is more accurate than related approximate approaches that do not consider positional information; our two-phase containment filtering algorithm is more efficient than a single-phase brute force algorithm.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.