2021
DOI: 10.48550/arxiv.2106.05006
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Text-to-SQL in the Wild: A Naturally-Occurring Dataset Based on Stack Exchange Data

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
4
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
2
2

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(5 citation statements)
references
References 18 publications
1
4
0
Order By: Relevance
“…T5+Schema shows comparable performance to T5 in both databases. This result agrees with the recent finding in [12] that models trained in a single database setting do not effectively leverage schema information. Additional qualitative results are provided in Supplementary H, including SQL generation results by question complexity, time expressions, falsely executed results, and refused results.…”
Section: Results and Findingssupporting
confidence: 92%
See 3 more Smart Citations
“…T5+Schema shows comparable performance to T5 in both databases. This result agrees with the recent finding in [12] that models trained in a single database setting do not effectively leverage schema information. Additional qualitative results are provided in Supplementary H, including SQL generation results by question complexity, time expressions, falsely executed results, and refused results.…”
Section: Results and Findingssupporting
confidence: 92%
“…KaggleDBQA [19] and SEDE [12] are designed to bridge the gap between academic datasets and practical usability by using real databases and naturally-occurring utterances. However, we have gone one step further where the question authors (the poll respondents) were not presented with the database schema (?Schema), which adds more reality to the dataset [12].…”
Section: Ehrsql and Other Datasetsmentioning
confidence: 99%
See 2 more Smart Citations
“…We also observe an array of that focus on generating SQL queries from natural language. Some of these datasets are synthetic (Zhong et al, 2017), mined from StackOverflow Hazoom et al, 2021) and Github , and human-curated (Tang and Mooney, 2000;Popescu et al, 2003;Giordani and Moschitti, 2012;Li and Jagadish, 2014;Iyer et al, 2017;Yu et al, 2018;Yaghmazadeh et al, 2017;Finegan-Dollak et al, 2018;Yu et al, 2019b). Map Question-Answering.…”
Section: Datasetsmentioning
confidence: 99%