We present Falcon, an interactive, deterministic, and declarative data cleaning system, which uses SQL update queries as the language to repair data. Falcon does not rely on the existence of a set of pre-defined data quality rules. On the contrary, it encourages users to explore the data, identify possible problems, and make updates to fix them. Bootstrapped by one user update, Falcon guesses a set of possible SQL update queries that can be used to repair the data. The main technical challenge addressed in this paper consists in finding a set of SQL update queries that is minimal in size and at the same time fixes the largest number of errors in the data. We formalize this problem as a search in a lattice-shaped space. To guarantee that the chosen updates are semantically correct, Falcon navigates the lattice by interacting with users to gradually validate the set of SQL update queries. Besides using traditional one-hop based traverse algorithms (e.g., BFS or DFS), we describe novel multi-hop search algorithms such that Falcon can dive over the lattice and conduct the search efficiently. Our novel search strategy is coupled with a number of optimization techniques to further prune the search space and efficiently maintain the lattice. We have conducted extensive experiments using both real-world and synthetic datasets to show that Falcon can effectively communicate with users in data repairing
Computational thinking is the capacity of undertaking a problem-solving process in various disciplines (including STEM, i.e. science, technology, engineering and mathematics) using distinctive techniques that are typical of computer science. It is nowadays considered a fundamental skill for students and citizens, that has the potential to affect future generations. At the roots of computational-thinking abilities stands the knowledge of computer programming, i.e. coding. With the goal of fostering computational thinking in young students, we address the challenging and open problem of using methods, tools and techniques to support teaching and learning of computer-programming skills in school curricula of the secondary grade and university courses. This problem is made complex by several factors. In fact, coding requires abstraction capabilities and complex cognitive skills such as procedural and conditional reasoning, planning, and analogical reasoning. In this paper, we introduce a new paradigm called ACME (“Code Animation by Evolved Metaphors”) that stands at the foundation of the Diogene-CT code visualization environment and methodology. We develop consistent visual metaphors for both procedural and object-oriented programming. Based on the metaphors, we introduce a playground architecture to support teaching and learning of the principles of coding. To the best of our knowledge, this is the first scalable code visualization tool using consistent metaphors in the field of the Computing Education Research (CER). It might be considered as a new kind of tools named as code visualization environments.
We consider the problem of handling digital identities within serviceoriented architecture (SOA) architectures. We explore federated, single signon (SSO) solutions based on identity managers and service providers. After an overview of the different standards and protocols, we introduce a middlewarebased architecture to simplify the integration of legacy systems within such platforms. Our solution is based on a middleware module that decouples the legacy system from the identity-management modules.We consider both standard point-to-point service architectures, and complex government interoperability frameworks, and report experiments to show that our solution provides clear advantages both in terms of effectiveness and performance
This paper introduces Greg, ML, a machine-learning tool for generating automatic diagnostic suggestions based on patient profiles. We discuss the architecture that stands at the core of Greg, and some experimental results based on the working prototype we have developed. Finally, we discuss challenges and opportunities related to the use of this kind of tools in medicine, and some important lessons learned developing the tool. In this respect, despite the ironic title of this paper, we underline that Greg should be conceived primarily as a support for expert doctors in their diagnostic decisions, and can hardly replace humans in their judgment.
This paper introduces the Greg, ML platform, a machine-learning engine and toolset conceived to generate automatic diagnostic suggestions based on patient profiles. Greg, ML departs from many other experiences in machine learning for healthcare in the fact that it was designed to handle a large number of different diagnoses, in the order of the hundreds. We discuss the architecture that stands at the core of Greg, ML, designed to handle the complex challenges posed by this ambitious goal, and confirm its effectiveness with experimental results based on the working prototype we have developed. Finally, we discuss challenges and opportunities related to the use of this kind of tools in medicine, and some important lessons learned while developing the tool. In this respect, we underline that Greg, ML should be conceived primarily as a support for expert doctors in their diagnostic decisions, and can hardly replace humans in their judgment.
Applications such as computational fact checking and data-to-text generation exploit the relationship between relational data and natural language text. Despite promising results in these areas, state of the art solutions simply fail in managing "data-ambiguity", i.e., the case when there are multiple interpretations of the relationship between the textual sentence and the relational data. To tackle this problem, we introduce Pythia, a system that, given a relational table 𝐷, generates textual sentences that contain factual ambiguities w.r.t. the data in 𝐷. Such sentences can then be used to train target applications in handling data-ambiguity.In this demonstration, we first show how our system generates data ambiguous sentences for a given table in an unsupervised fashion by data profiling and query generation. We then demonstrate how two existing applications benefit from Pythia's generated sentences, improving the state-of-the-art results. The audience will interact with Pythia by changing input parameters in an interactive fashion, including the upload of their own dataset to see what data ambiguous sentences are generated for it.
Industry 4.0 is focused on the task of creating Smart Factories, which require the automation of traditional industrial processes and the fully connection and integration of different systems and devices. However, despite the wide availability of tools and technology, developing intelligent applications in the industry framework remains a complex and expensive task. This paper proposes a lightweight, extensible and scalable framework called IoT Helper to facilitate the adoption of IoT and IIoT solutions both in industry and domotics. The framework is designed to be highly flexible and declarative in nature, thus allowing for a wide range of configurations with minimal user efforts. To emphasize the practical applicability or our proposal, we present two real-life use cases where the framework was successfully adopted. We also investigate a crucial aspect of these applications, i.e., what level of scalability can be achieved with a lean generic framework based on inexpensive components such as ours. Comprehensive experimental results show the excellent cost-to-performance ratio of our solution. We consider this to be an important contribution because it paves the way for a more widespread adoption of IIoT-enabling technologies in industry.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.