Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing 2018
DOI: 10.18653/v1/d18-1425
|View full text |Cite
|
Sign up to set email alerts
|

Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task

Abstract: We present Spider, a large-scale, complex and cross-domain semantic parsing and textto-SQL dataset annotated by 11 college students. It consists of 10,181 questions and 5,693 unique complex SQL queries on 200 databases with multiple tables, covering 138 different domains. We define a new complex and cross-domain semantic parsing and textto-SQL task where different complex SQL queries and databases appear in train and test sets. In this way, the task requires the model to generalize well to both new SQL queries… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
560
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 440 publications
(562 citation statements)
references
References 35 publications
2
560
0
Order By: Relevance
“…Recently, Yu et al (2018b) released a manually labelled dataset for parsing natural language questions into complex SQL, which facilitates related research. Yu et al (2018b)'s dataset is exclusive for English questions. Intuitively, the same semantic parsing task can be applied cross-lingual, since SQL is a universal semantic representation and database interface.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Recently, Yu et al (2018b) released a manually labelled dataset for parsing natural language questions into complex SQL, which facilitates related research. Yu et al (2018b)'s dataset is exclusive for English questions. Intuitively, the same semantic parsing task can be applied cross-lingual, since SQL is a universal semantic representation and database interface.…”
Section: Introductionmentioning
confidence: 99%
“…We investigate parsing Chinese questions to SQL by creating a first dataset, and empirically evaluating a strong baseline model on the dataset. In particular, we translate the Spider (Yu et al, 2018b) dataset into Chinese. Using the model of Yu et al (2018a), we compare several key model configurations.…”
Section: Introductionmentioning
confidence: 99%
“…(1) using more intelligent interaction designs (e.g., free-form text as user feedback) to speed up the hypothesis space searching globally, (2) strengthening the world model to nail down a smaller set of plausible hypotheses based on both the initial question and user feedback, and (3) training the agent to learn to improve the parsing accuracy while minimizing the number of required human interventions over time. Table 8 shows the extended lexicon entries and grammar rules in NLG for applying our MISP-SQL agent to generate more complex SQL queries, such as those on Spider (Yu et al, 2018c). In this dataset, a SQL query can associate with multiple tables.…”
Section: Discussionmentioning
confidence: 99%
“…In MISP-SQL, we consider four syntactic categories: AGG for aggregators, OP for operators, COL for columns and Q for generated questions. However, it can be extended with more lexicon entries and grammar rules to accommodate more complex SQL in Spider (Yu et al, 2018c), which we show in Appendix A.…”
Section: Actuator: An Nl Generatormentioning
confidence: 99%
“…We describe the process of estimating the correctness of collected QDMR annotations. Similar to previous works (Yu et al, 2018;Kwiatkowski et al, 2019) we use expert judgements, where the experts had prepared the guidelines for the annotation task. Given a question and its annotated QDMR, (q, s) the expert determines the correctness of s using one of the following categories:…”
Section: Quality Analysismentioning
confidence: 99%