2019
DOI: 10.1007/978-3-030-15712-8_59
|View full text |Cite
|
Sign up to set email alerts
|

Impact of Training Dataset Size on Neural Answer Selection Models

Abstract: It is held as a truism that deep neural networks require large datasets to train effective models. However, large datasets, especially with high-quality labels, can be expensive to obtain. This study sets out to investigate (i) how large a dataset must be to train well-performing models, and (ii) what impact can be shown from fractional changes to the dataset size. A practical method to investigate these questions is to train a collection of deep neural answer selection models using fractional subsets of varyi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
19
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
3
3
1

Relationship

1
6

Authors

Journals

citations
Cited by 34 publications
(23 citation statements)
references
References 14 publications
0
19
0
Order By: Relevance
“…In my ECIR'19 short paper [12], some bene ts from increased training data volumes were demonstrated, but increasing data volume did not show a large di erence. Some models quickly reached a plateau of performance within the range of available training data volumes, indicating memorization of the smaller dataset.…”
Section: Analysis and Insightsmentioning
confidence: 97%
See 2 more Smart Citations
“…In my ECIR'19 short paper [12], some bene ts from increased training data volumes were demonstrated, but increasing data volume did not show a large di erence. Some models quickly reached a plateau of performance within the range of available training data volumes, indicating memorization of the smaller dataset.…”
Section: Analysis and Insightsmentioning
confidence: 97%
“…In my ECIR'19 short paper [12], the e ects of varying data size on training neural models for answer sentence selection were studied. In my ICTIR'20 full paper [13], the impact of template-based synthetic data generation on training neural knowledge graph question answering (KGQA) models was investigated.…”
Section: Analysis and Insightsmentioning
confidence: 99%
See 1 more Smart Citation
“…However, computer vision based machine learning algorithms-those which are able to identify objects in images-require a large amount of training data to be able to recognise any specific object. The accuracy of these algorithms deteriorates rapidly if insufficient training data are available [26][27][28], and, in some cases, simpler statistical methods have been shown to perform better than deep learning algorithms for object recognition [29].…”
Section: Introductionmentioning
confidence: 99%
“…After splitting the data into training, testing and verification sets, this would leave a small number of meaningfully different images to train with (i.e., not temporally adjacent in the footage, thus forming essentially the same image). Previous work has shown that small training dataset sizes lead to dramatic drops in classification accuracy [26][27][28]. With a small number of experiments, a machine learning system would only recognise the specific humans in the specific scenarios we tested with, and it would be difficult to be certain that the reported performance would in fact be meaningful in a wider range of real world scenarios.…”
mentioning
confidence: 96%