2018
DOI: 10.48550/arxiv.1806.00358
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

A Systematic Classification of Knowledge, Reasoning, and Context within the ARC Dataset

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
6
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
1

Relationship

1
5

Authors

Journals

citations
Cited by 6 publications
(6 citation statements)
references
References 0 publications
0
6
0
Order By: Relevance
“…It contains a dataset with 2,590 multiple-choice questions written for the primary school science exam (Clark et al, 2018). In this competition, Boratko et al (2018aBoratko et al ( , 2018b verified the rewritten query effect on the pretrained DrQA model (Chen et al, 2017), and the result proved that the score increased by 0.42, thus quantitatively verifying the validity of the query rewriting. Musa et al (2018) is based on the Seq2seq model and the NCRF model, supplemented by word vectors pretrained on the knowledge graph as a priori knowledge to generate multiple new queries by identifying key items from the OQ.…”
Section: Related Work 21 Query Rewritingmentioning
confidence: 70%
“…It contains a dataset with 2,590 multiple-choice questions written for the primary school science exam (Clark et al, 2018). In this competition, Boratko et al (2018aBoratko et al ( , 2018b verified the rewritten query effect on the pretrained DrQA model (Chen et al, 2017), and the result proved that the score increased by 0.42, thus quantitatively verifying the validity of the query rewriting. Musa et al (2018) is based on the Seq2seq model and the NCRF model, supplemented by word vectors pretrained on the knowledge graph as a priori knowledge to generate multiple new queries by identifying key items from the OQ.…”
Section: Related Work 21 Query Rewritingmentioning
confidence: 70%
“…It takes about 32 hours to finish the training of GPT-3 350M and 120 hours of GPT-3 1.3B . We evaluate our results on 20 zero-shot evaluation tasks, including 19 accuracy evaluation tasks (i.e., HellaSwag [77], LAMBADA [47], TriviaQA [30], WebQS [3], Winogrande [54], PIQA [62], ARC (Challenge/Easy) [7], ANLI (R1/R2/R3) [71], OpenBookQA [44], RACE-h [32], BoolQ [12], Copa [1], RTE [13], WSC [35], MultiRC [73], ReCoRD [78]) and 1 language modeling generation task (i.e., Wikitext-2 [42]). For ZeroQuant and ZeroQuant-LKD, we use 64/128 groups for group-wise weight quantization on GPT-3 350M /GPT-3 1.3B for all the weight matrices.…”
Section: B2 Details Of Main Resultsmentioning
confidence: 99%
“…Zero-Shot Tasks. While our focus is on language generation, we also evaluate the performance of quantized models on some popular zero-shot tasks, namely LAMBADA [23], ARC (Easy and Challenge) [3] and PIQA [30]. Figure 4 visualizes model performance on LAMBADA (and see also the LAMB.…”
Section: The Gptq Algorithmmentioning
confidence: 99%