Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen 2019
DOI: 10.18653/v1/d19-1628
|View full text |Cite
|
Sign up to set email alerts
|

Pre-Training BERT on Domain Resources for Short Answer Grading

Abstract: Pre-trained BERT contextualized representations have achieved state-of-the-art results on multiple downstream NLP tasks by fine-tuning with task-specific data. While there has been a lot of focus on task-specific fine-tuning, there has been limited work on improving the pretrained representations. In this paper, we explore ways of improving the pre-trained contextual representations for the task of automatic short answer grading, a critical component of intelligent tutoring systems. We show that the pre-traine… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
22
0
6

Year Published

2020
2020
2022
2022

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 63 publications
(31 citation statements)
references
References 12 publications
0
22
0
6
Order By: Relevance
“…As a rule of thumb to fine-tune BERT for downstream tasks, Devlin et al ( 2019) suggested a minimal hyperparameter tuning strategy relying on a gridsearch on the following ranges: learning rate ∈ {2e−5, 3e−5, 4e−5, 5e−5}, number of training epochs ∈ {3, 4}, batch size ∈ {16, 32} and fixed dropout rate of 0.1. These not well justified suggestions are blindly followed in the literature Alsentzer et al, 2019;Beltagy et al, 2019;Sung et al, 2019). Given the relatively small size of the datasets, we use batch sizes ∈ {4, 8, 16, 32}.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…As a rule of thumb to fine-tune BERT for downstream tasks, Devlin et al ( 2019) suggested a minimal hyperparameter tuning strategy relying on a gridsearch on the following ranges: learning rate ∈ {2e−5, 3e−5, 4e−5, 5e−5}, number of training epochs ∈ {3, 4}, batch size ∈ {16, 32} and fixed dropout rate of 0.1. These not well justified suggestions are blindly followed in the literature Alsentzer et al, 2019;Beltagy et al, 2019;Sung et al, 2019). Given the relatively small size of the datasets, we use batch sizes ∈ {4, 8, 16, 32}.…”
Section: Methodsmentioning
confidence: 99%
“…Improvements were reported in downstream tasks in both cases. Sung et al (2019) further pre-trained BERT-BASE on textbooks and question-answer pairs to improve short answer grading for intelligent tutoring systems.…”
Section: Related Workmentioning
confidence: 99%
“…For our experimentation, we have used SciBert (Beltagy et al, 2019b) to get the sentence embeddings as it is trained on a large multi-domain corpus of scientific publications to improve performance on many scientific NLP tasks like summarization (Gabriel et al, 2019) and relation extraction (Sung et al, 2019). For convolution layer, we have used 600 filters, and 3 kernels with ReLU as our activation function.…”
Section: Methodsmentioning
confidence: 99%
“…al. [32] have shown that by updating the pre-trained BERT language model with domainspecific books and question-answer data, better results can be achieved instead of fine-tuning the model.…”
Section: Related Workmentioning
confidence: 99%