Programming-by-example technologies empower end-users to create simple programs merely by providing input/output examples. Existing systems are designed around solvers specialized for a specific set of data types or domain-specific language (DSL). We present a program synthesizer which can be parameterized by an arbitrary DSL that may contain conditionals and loops and therefore is able to synthesize programs in any domain. In order to use our synthesizer, the user provides a sequence of increasingly sophisticated input/output examples along with an expert-written DSL definition. These two inputs correspond to the two key ideas that allow our synthesizer to work in arbitrary domains. First, we developed a novel iterative synthesis technique inspired by test-driven development-which also gives our technique the name of testdriven synthesis-where the input/output examples are consumed one at a time as the program is refined. Second, the DSL allows our system to take an efficient component-based approach to enumerating possible programs. We present applications of our synthesis methodology to end-user programming for transformations over strings, XML, and table layouts. We compare our synthesizer on these applications to state-of-the-art DSL-specific synthesizers as well to the general purpose synthesizer Sketch.
Modern programming frameworks provide enormous libraries arranged in complex structures, so much so that a large part of modern programming is searching for APIs that "surely exist" somewhere in an unfamiliar part of the framework. We present a novel way of phrasing a search for an unknown API: the programmer simply writes an expression leaving holes for the parts they do not know. We call these expressions partial expressions. We present an efficient algorithm that produces likely completions ordered by a ranking scheme based primarily on the similarity of the types of the APIs suggested to the types of the known expressions. This gives a powerful language for both API discovery and code completion with a small impedance mismatch from writing code. In an automated experiment on mature C# projects, we show our algorithm can place the intended expression in the top 10 choices over 80% of the time.
Learning to code can be made more effective and sustainable if it is perceived as fun by the learner. Code Hunt uses puzzles that players have to explore by means of clues presented as test cases. Players iteratively modify their code to match the functional behaviour of secret solutions. This way of learning to code is very different to learning from a specification. It is essentially re-engineering from test cases. Code Hunt is based on the test/clue generation of Pex, a white-box test generation tool that uses dynamic symbolic execution. Pex performs a guided search to determine feasible execution paths. Conceptually, solving a puzzle is the manual process of conducting search-based test generation: the "test data" to be generated by the player is the player's code, and the "fitness values" that reflect the closeness of the player's code to the secret code are the clues (i.e., Pex-generated test cases). This paper is the first one to describe Code Hunt and its extensions over its precursor Pex4Fun. Code Hunt represents a high-impact educational gaming platform that not only internally leverages fitness values to guide test/clue generation but also externally offers fun user experiences where search-based test generation is manually emulated. Because the amount of data is growing all the time, the entire system runs in the cloud on Windows Azure.
We address the problem of learning a syntactic profile for a collection of strings, i.e. a set of regex-like patterns that succinctly describe the syntactic variations in the strings. Real-world datasets, typically curated from multiple sources, often contain data in various syntactic formats. Thus, any data processing task is preceded by the critical step of data format identification. However, manual inspection of data to identify the different formats is infeasible in standard big-data scenarios.Prior techniques are restricted to a small set of pre-defined patterns (e.g. digits, letters, words, etc.), and provide no control over granularity of profiles. We define syntactic profiling as a problem of clustering strings based on syntactic similarity, followed by identifying patterns that succinctly describe each cluster. We present a technique for synthesizing such profiles over a given language of patterns, that also allows for interactive refinement by requesting a desired number of clusters.Using a state-of-the-art inductive synthesis framework, PROSE, we have implemented our technique as FlashProfile. Across 153 tasks over 75 large real datasets, we observe a median profiling time of only ∼0.7 s. Furthermore, we show that access to syntactic profiles may allow for more accurate synthesis of programs, i.e. using fewer examples, in programming-by-example (PBE) workflows such as Flash Fill.
Programming-by-Examples (PBE) involves synthesizing an "intended program" from a small set of user-provided input-output examples. A key PBE strategy has been to restrict the search to a carefully designed small domain-specific language (DSL) with "effectively-invertible" (EI) operators at the top and "effectively-enumerable" (EE) operators at the bottom. This facilitates an effective combination of top-down synthesis strategy (which backpropagates outputs over various paths in the DSL using inverse functions) with a bottom-up synthesis strategy (which propagates inputs over various paths in the DSL). We address the problem of scaling synthesis to large DSLs with several non-EI/EE operators. This is motivated by the need to support a richer class of transformations and the need for readable code generation. We propose a novel solution strategy that relies on propagating fewer values and over fewer paths. Our first key idea is that of "cut functions" that prune the set of values being propagated by using knowledge of the sub-DSL on the other side. Cuts can be designed to preserve completeness of synthesis; however, DSL designers may use incomplete cuts to have finer control over the kind of programs synthesized. In either case, cuts make search feasible for non-EI/EE operators and efficient for deep DSLs. Our second key idea is that of "guarded DSLs" that allow a precedence on DSL operators, which dynamically controls exploration of various paths in the DSL. This makes search efficient over grammars with large fanouts without losing recall. It also makes ranking simpler yet more effective in learning an intended program from very few examples. Both cuts and precedence provide a mechanism to the DSL designer to restrict search to a reasonable, and possibly incomplete, space of programs. Using cuts and gDSLs, we have built FlashFill++, an industrial-strength PBE engine for performing rich string transformations, including datetime and number manipulations. The FlashFill++ gDSL is designed to enable readable code generation in different target languages including Excel's formula language, PowerFx, and Python. We show FlashFill++ is more expressive, more performant, and generates better quality code than comparable existing PBE systems. FlashFill++ is being deployed in several mass-market products ranging from spreadsheet software to notebooks and business intelligence applications, each with millions of users.
No abstract
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.