A Systematic Classification of Knowledge, Reasoning, and Context within the ARC Dataset

Boratko, Michael; Padigela, Harshit; Mikkilineni, Divyendra; Yuvraj, Pritish; Das, Raj; McCallum, Andrew; Chang, Maria Hsia; Fokoue-Nkoutche, Achille; Kapanipathi, Pavan; Mattei, Nicholas; Musa, Ryan; Talamadupula, Kartik; Witbrock, Michael

doi:10.48550/arxiv.1806.00358

Cited by 6 publications

(6 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…It contains a dataset with 2,590 multiple-choice questions written for the primary school science exam (Clark et al, 2018). In this competition, Boratko et al (2018aBoratko et al ( , 2018b verified the rewritten query effect on the pretrained DrQA model (Chen et al, 2017), and the result proved that the score increased by 0.42, thus quantitatively verifying the validity of the query rewriting. Musa et al (2018) is based on the Seq2seq model and the NCRF model, supplemented by word vectors pretrained on the knowledge graph as a priori knowledge to generate multiple new queries by identifying key items from the OQ.…”

Section: Related Work 21 Query Rewritingmentioning

confidence: 70%

A novel word-graph-based query rewriting method for question answering

Yan

Dang

et al. 2023

DTA

View full text Add to dashboard Cite

PurposeQuestion answering (QA) answers the questions asked by people in the form of natural language. In the QA, due to the subjectivity of users, the questions they query have different expressions, which increases the difficulty of text retrieval. Therefore, the purpose of this paper is to explore new query rewriting method for QA that integrates multiple related questions (RQs) to form an optimal question. Moreover, it is important to generate a new dataset of the original query (OQ) with multiple RQs.Design/methodology/approachThis study collects a new dataset SQuAD_extend by crawling the QA community and uses word-graph to model the collected OQs. Next, Beam search finds the best path to get the best question. To deeply represent the features of the question, pretrained model BERT is used to model sentences.FindingsThe experimental results show three outstanding findings. (1) The quality of the answers is better after adding the RQs of the OQs. (2) The word-graph that is used to model the problem and choose the optimal path is conducive to finding the best question. (3) Finally, BERT can deeply characterize the semantics of the exact problem.Originality/valueThe proposed method can use word-graph to construct multiple questions and select the optimal path for rewriting the question, and the quality of answers is better than the baseline. In practice, the research results can help guide users to clarify their query intentions and finally achieve the best answer.

show abstract

Section: Related Work 21 Query Rewritingmentioning

confidence: 70%

A novel word-graph-based query rewriting method for question answering

Yan

Dang

et al. 2023

DTA

View full text Add to dashboard Cite

show abstract

“…It takes about 32 hours to finish the training of GPT-3 350M and 120 hours of GPT-3 1.3B . We evaluate our results on 20 zero-shot evaluation tasks, including 19 accuracy evaluation tasks (i.e., HellaSwag [77], LAMBADA [47], TriviaQA [30], WebQS [3], Winogrande [54], PIQA [62], ARC (Challenge/Easy) [7], ANLI (R1/R2/R3) [71], OpenBookQA [44], RACE-h [32], BoolQ [12], Copa [1], RTE [13], WSC [35], MultiRC [73], ReCoRD [78]) and 1 language modeling generation task (i.e., Wikitext-2 [42]). For ZeroQuant and ZeroQuant-LKD, we use 64/128 groups for group-wise weight quantization on GPT-3 350M /GPT-3 1.3B for all the weight matrices.…”

Section: B2 Details Of Main Resultsmentioning

confidence: 99%

ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers

Yao¹,

Aminabadi²,

Zhang³

et al. 2022

Preprint

View full text Add to dashboard Cite

How to efficiently serve ever-larger trained natural language models in practice has become exceptionally challenging even for powerful cloud servers due to their prohibitive memory/computation requirements. In this work, we present an efficient and affordable post-training quantization approach to compress large Transformer-based models, termed as ZeroQuant. ZeroQuant is an end-to-end quantization and inference pipeline with three main components: (1) a fine-grained hardware-friendly quantization scheme for both weight and activations; (2) a novel affordable layer-by-layer knowledge distillation algorithm (LKD) even without the access to the original training data; (3) a highly-optimized quantization system backend support to remove the quantization/dequantization overhead. As such, we are able to show that: (1) ZeroQuant can reduce the precision for weights and activations to INT8 in a cost-free way for both BERT and GPT-3-style models with minimal accuracy impact, which leads to up to 5.19x/4.16x speedup on those models compared to FP16 inference; (2) ZeroQuant plus LKD affordably quantize the weights in the fully-connected module to INT4 along with INT8 weights in the attention module and INT8 activations, resulting in 3x memory footprint reduction compared to the FP16 model; (3) ZeroQuant can be directly applied to two of the largest open-sourced language models, including GPT-J6B and GPT-NeoX20B, for which our INT8 model achieves similar accuracy as the FP16 model but achieves up to 5.2x better efficiency. * Code will be released soon as a part of https://github.com/microsoft/DeepSpeed

show abstract

“…Zero-Shot Tasks. While our focus is on language generation, we also evaluate the performance of quantized models on some popular zero-shot tasks, namely LAMBADA [23], ARC (Easy and Challenge) [3] and PIQA [30]. Figure 4 visualizes model performance on LAMBADA (and see also the LAMB.…”

Section: The Gptq Algorithmmentioning

confidence: 99%

GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers

Frantar¹,

Ashkboos²,

Hoefler³

et al. 2022

Preprint

View full text Add to dashboard Cite

Generative Pre-trained Transformer (GPT) models set themselves apart through breakthrough performance across complex language modelling tasks, but also by their extremely high computational and storage costs. Specifically, due to their massive size, even inference for large, highly-accurate GPT models may require multiple performant GPUs to execute, which limits the usability of such models. While there is emerging work on relieving this pressure via model compression, the applicability and performance of existing compression techniques is limited by the scale and complexity of GPT models. In this paper, we address this challenge, and propose GPTQ, a new one-shot weight quantization method based on approximate second-order information, that is both highlyaccurate and highly-efficient. Specifically, GPTQ can quantize GPT models with 175 billion parameters in approximately four GPU hours, reducing the bitwidth down to 3 or 4 bits per weight, with negligible accuracy degradation relative to the uncompressed baseline. Our method more than doubles the compression gains relative to previously-proposed one-shot quantization methods, preserving accuracy, allowing us for the first time to execute an 175 billion-parameter model inside a single GPU. We show experimentally that these improvements can be leveraged for endto-end inference speedups over FP16, of around 2x when using high-end GPUs (NVIDIA A100) and 4x when using more cost-effective ones (NVIDIA A6000). The implementation is available at https://github.com/IST-DASLab/gptq.

show abstract

A Systematic Classification of Knowledge, Reasoning, and Context within the ARC Dataset

Cited by 6 publications

References 0 publications

A novel word-graph-based query rewriting method for question answering

A novel word-graph-based query rewriting method for question answering

ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers

GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers

Contact Info

Product

Resources

About