Abstract. Sequence to sequence models have been widely used in the recent years in the different tasks of Natural Language processing. In particular, the concept has been deeply adopted to treat the problem of translating human language questions to SQL. In this context, many studies suggest the use of sequence to sequence approaches for predicting the target SQL queries using the different available datasets. In this paper, we put the light on another way to resolve natural language processing tasks, especially the Natural Language to SQL one using the method of sketch-based decoding which is based on a sketch with holes that the model incrementally tries to fill. We present the pros and cons of each approach and how a sketch-based model can outperform the already existing solutions in order to predict the wanted SQL queries and to generate to unseen input pairs in different contexts and cross-domain datasets, and finally we discuss the test results of the already proposed models using the exact matching scores and the errors propagation and the time required for the training as metrics.
In the last decade, many intelligent interfaces and layers have been suggested to allow the use of relational databases and extraction of the content using only the natural language. However most of them struggle when exposed to new databases. In this article, we present SQLSketch, a sketch-based network for generating SQL queries to address the problem of automatically translate Natural Languages questions to SQL using the related databases schemas. We argue that the previous models that use full or partial sequence-to-sequence structure in the decoding phase can, in fact, have counter-effect on the generation operation and came up with more loss of the context or the meaning of the user question. In this regard, we use a full sketch-based structure that decouples the generation process into many small prediction modules. The SQLSketch is evaluated against GreatSQL, a new cross-domain, large-scale and balanced dataset for the Natural Language to SQL translation task. For a long-term aim of making better models and contributing in adding more improvements to the semantic parsing tasks, we propose the GreatSQL dataset as the first balanced cross-domain corpus that includes 45,969 pairs of natural language questions and their corresponding SQL queries in addition to simplified and well structured ground-truth annotations. We establish results for SQLSketch using GreatSQL dataset and compare the performance against two popular types of models that represent the sequential and partial-sketch based approaches. Experimental result shows that SQLSketch outperforms the baseline models by 13% in exact matching accuracy and achieve a score of 23.9% to be the new state-of-the-art model on GreatSQL.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.