2022
DOI: 10.48550/arxiv.2212.08785
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Importance of Synthesizing High-quality Data for Text-to-SQL Parsing

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
0
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(5 citation statements)
references
References 0 publications
0
0
0
Order By: Relevance
“…Previous Work We compare ODIS with state-ofthe-art approaches that either require finetuning on the Spider training set or utilize in-context learning. For the finetuning-based methods, we select SmBoP which utilizes RoBERTa-large as the pretrained model, as well as T5+Picard (Scholak et al, 2021), ShiP+Picard (Zhao et al, 2022), and RESDSQL (Li et al, 2023a), which employ T5-3B (Raffel et al, 2020) as the pretrained model.…”
Section: Baseline Methodsmentioning
confidence: 99%
See 4 more Smart Citations
“…Previous Work We compare ODIS with state-ofthe-art approaches that either require finetuning on the Spider training set or utilize in-context learning. For the finetuning-based methods, we select SmBoP which utilizes RoBERTa-large as the pretrained model, as well as T5+Picard (Scholak et al, 2021), ShiP+Picard (Zhao et al, 2022), and RESDSQL (Li et al, 2023a), which employ T5-3B (Raffel et al, 2020) as the pretrained model.…”
Section: Baseline Methodsmentioning
confidence: 99%
“…end if 14: end for Synthetic data generation To generate synthetic data, we follow previous work to first sample synthetic SQL queries {y i } and then translate SQL queries into natural language questions {x i } (Zhong et al, 2020b;Zhao et al, 2022). We use SHiP (Zhao et al, 2022) to sample synthetic SQL queries, which extract templates from out-of-domain databases and sample columns and values from the test database to fill the templates. After obtaining synthetic SQL queries, we use the Codex to generate corresponding synthetic NLQs, in the same procedure as in our analysis of in-domain SQL distribution.…”
Section: In-domain Synthetic Demonstration Creationmentioning
confidence: 99%
See 3 more Smart Citations