Martín Ugarte scite author profile

JSON-the most popular data format for sending API requests and responses-is still lacking a standardized schema or meta-data definition that allows the developers to specify the structure of JSON documents. JSON Schema is an attempt to provide a general purpose schema language for JSON, but it is still work in progress, and the formal specification has not yet been agreed upon. Why this could be a problem becomes evident when examining the behaviour of numerous tools for validating JSON documents against this initial schema proposal: although they agree on most general cases, when presented with the greyer areas of the specification they tend to differ significantly. In this paper we provide the first formal definition of syntax and semantics for JSON Schema and use it to show that implementing this layer on top of JSON is feasible in practice. This is done both by analysing the theoretical aspects of the validation problem and by showing how to set up and validate a JSON Schema for Wikidata, the central storage for Wikimedia. Copyright is held by the International World Wide Web Conference Committee (IW3C2). IW3C2 reserves the right to provide a hyperlink to the author's site if the Material is used in electronic media.

show abstract

Constant Delay Algorithms for Regular Document Spanners

Florenzano

Riveros

Ugarte

et al. 2018

View full text Add to dashboard Cite

Regular expressions and automata models with capture variables are core tools in rule-based information extraction. These formalisms, also called regular document spanners, use regular languages in order to locate the data that a user wants to extract from a text document, and then store this data into variables. Since document spanners can easily generate large outputs, it is important to have good evaluation algorithms that can generate the extracted data in a quick succession, and with relatively little precomputation time. Towards this goal, we present a practical evaluation algorithm that allows constant delay enumeration of a spanner's output after a precomputation phase that is linear in the document. While the algorithm assumes that the spanner is specified in a syntactic variant of variable set automata, we also study how it can be applied when the spanner is specified by general variable set automata, regex formulas, or spanner algebras. Finally, we study the related problem of counting the number of outputs of a document spanner, providing a fine grained analysis of the classes of document spanners that support efficient enumeration of their results.

show abstract

An Information-Theoretic Approach to Self-Organisation: Emergence of Complex Interdependencies in Coupled Dynamical Systems

Rosas

Mediano

Ugarte

2018

Entropy

View full text Add to dashboard Cite

Self-organisation lies at the core of fundamental but still unresolved scientific questions, and holds the promise of de-centralised paradigms crucial for future technological developments. While self-organising processes have been traditionally explained by the tendency of dynamical systems to evolve towards specific configurations, or attractors, we see self-organisation as a consequence of the interdependencies that those attractors induce. Building on this intuition, in this work we develop a theoretical framework for understanding and quantifying self-organisation based on coupled dynamical systems and multivariate information theory. We propose a metric of global structural strength that identifies when self-organisation appears, and a multi-layered decomposition that explains the emergent structure in terms of redundant and synergistic interdependencies. We illustrate our framework on elementary cellular automata, showing how it can detect and characterise the emergence of complex structures. I see it" logic, which might eventually prevent further systematic developments [24]. Formulating formal definitions of self-organisation is challenging, partly because self-organisation has been used in diverse contexts and with different purposes [25], and partly due to the fact that the basic notions of "self" and "organisation" are already problematic themselves [26].The absence of an agreed formal definition, combined with the relevance of this notion for scientific and technological advances, generates a need for further explorations about the principles of self-organisation. Scope of this Work and ContributionIn the spirit of Reference [27], we explore to what extent an information-theoretic perspective can illuminate the inner workings of self-organising processes. Due to the connections between information theory and thermodynamics [28,29], our approach can be seen as an extension of previous works that relate self-organisation and statistical physics (see e.g. [30][31][32]). In previous research, self-organisation has been associated with a reduction in the system's entropy [30,33,34] -in contrast, we argue that entropy reduction alone is not a robust predictor of self-organisation, and additional metrics are required.This work establishes a way of understanding self-organising processes that is consistent with the Bayesian interpretation of information theory, as described in Reference [28]. One contribution of our approach is to characterise self-organising processes using multivariate information-theoretic toolsor, put differently, to provide a more fine-grained description of the underlying phenomena behind entropy reduction. We propose that self-organising processes are driven by spontaneous creation of interdependencies, while the reduction of entropy is a mere side effect of this. Following this rationale, we propose the binding information [35] as a metric of the strength of the interdependencies in out-of-equilibrium dynamical systems.Another contribution of our framework is to propose a multi-layer...

show abstract

A Formal Framework for Complex Event Recognition

Grez

Riveros

Ugarte

et al. 2021

ACM Trans. Database Syst.

View full text Add to dashboard Cite

Complex event recognition (CER) has emerged as the unifying field for technologies that require processing and correlating distributed data sources in real time. CER finds applications in diverse domains, which has resulted in a large number of proposals for expressing and processing complex events. Existing CER languages lack a clear semantics, however, which makes them hard to understand and generalize. Moreover, there are no general techniques for evaluating CER query languages with clear performance guarantees. In this article, we embark on the task of giving a rigorous and efficient framework to CER. We propose a formal language for specifying complex events, called complex event logic (CEL), that contains the main features used in the literature and has a denotational and compositional semantics. We also formalize the so-called selection strategies, which had only been presented as by-design extensions to existing frameworks. We give insight into the language design trade-offs regarding the strict sequencing operators of CEL and selection strategies. With a well-defined semantics at hand, we discuss how to efficiently process complex events by evaluating CEL formulas with unary filters. We start by introducing a formal computational model for CER, called complex event automata (CEA), and study how to compile CEL formulas with unary filters into CEA. Furthermore, we provide efficient algorithms for evaluating CEA over event streams using constant time per event followed by output-linear delay enumeration of the results.

show abstract

Proof-of-Learning: A Blockchain Consensus Mechanism Based on Machine Learning Competitions

Bravo-Márquez

Reeves

Ugarte

2019

View full text Add to dashboard Cite

This article presents WekaCoin, a peer-to-peer cryptocurrency based on a new distributed consensus protocol called Proof-of-Learning. Proof-of-learning achieves distributed consensus by ranking machine learning systems for a given task. The aim of this protocol is to alleviate the computational waste involved in hashing-based puzzles and to create a public distributed and verifiable database of state-of-the-art machine learning models and experiments.

show abstract

Complex Event Recognition Languages

Artikis

Margara

Ugarte

et al. 2017

View full text Add to dashboard Cite

General dynamic Yannakakis: conjunctive queries with theta joins under updates

et al. 2019

View full text Add to dashboard Cite

Modern application domains such as Composite Event Recognition (CER) and real-time Analytics require the ability to dynamically refresh query results under high update rates. Traditional approaches to this problem are based either on the materialization of subresults (to avoid their recomputation) or on the recomputation of subresults (to avoid the space overhead of materialization). Both techniques have recently been shown suboptimal: instead of materializing results and subresults, one can maintain a data structure that supports efficient maintenance under updates and can quickly enumerate the full query output, as well as the changes produced under single updates. Unfortunately, these data structures have been developed only for aggregate-join queries composed of equi-joins, limiting their applicability in domains such as CER where temporal joins are commonplace. In this paper, we present a new approach for dynamically evaluating queries with multi-way θ-joins under updates that is M. Idris

show abstract

Efficient Enumeration Algorithms for Regular Document Spanners

Florenzano

Riveros

Ugarte³

et al. 2020

ACM Trans. Database Syst.

View full text Add to dashboard Cite

Regular expressions and automata models with capture variables are core tools in rule-based information extraction. These formalisms, also called regular document spanners , use regular languages to locate the data that a user wants to extract from a text document and then store this data into variables. Since document spanners can easily generate large outputs, it is important to have efficient evaluation algorithms that can generate the extracted data in a quick succession, and with relatively little precomputation time. Toward this goal, we present a practical evaluation algorithm that allows output-linear delay enumeration of a spanner’s result after a precomputation phase that is linear in the document. Although the algorithm assumes that the spanner is specified in a syntactic variant of variable-set automata, we also study how it can be applied when the spanner is specified by general variable-set automata, regex formulas, or spanner algebras. Finally, we study the related problem of counting the number of outputs of a document spanner and provide a fine-grained analysis of the classes of document spanners that support efficient enumeration of their results.

show abstract

12 3 4

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Martín Ugarte

Foundations of JSON Schema

Constant Delay Algorithms for Regular Document Spanners

An Information-Theoretic Approach to Self-Organisation: Emergence of Complex Interdependencies in Coupled Dynamical Systems

A Formal Framework for Complex Event Recognition

Proof-of-Learning: A Blockchain Consensus Mechanism Based on Machine Learning Competitions

Complex Event Recognition Languages

General dynamic Yannakakis: conjunctive queries with theta joins under updates

Efficient Enumeration Algorithms for Regular Document Spanners

Contact Info

Product

Resources

About