Proceedings of the 12th International Conference on Natural Language Generation 2019
DOI: 10.18653/v1/w19-8630
|View full text |Cite
|
Sign up to set email alerts
|

Efficiency Metrics for Data-Driven Models: A Text Summarization Case Study

Abstract: Using data-driven models for solving text summarization or similar tasks has become very common in the last years. Yet most of the studies report basic accuracy scores only, and nothing is known about the ability of the proposed models to improve when trained on more data. In this paper, we define and propose three data efficiency metrics: data score efficiency, data time deficiency and overall data efficiency. We also propose a simple scheme that uses those metrics and apply it for a more comprehensive evalua… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
1
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(3 citation statements)
references
References 27 publications
(35 reference statements)
0
1
0
Order By: Relevance
“…Experimenting with keyphrases of scientific papers seems an ongoing trend that is greatly motivated by the availability of data in online academic repositories. Following the examples of [59] and [60], we took the initiative to produce an even larger collection of scientific paper keywords, titles and abstracts. Exploiting the whole data of Open Academic Graph (described in [61] and [62]), we retrieved keywords, title and abstract data wherever they were available.…”
Section: B a Novel And Huge Data Collectionmentioning
confidence: 99%
“…Experimenting with keyphrases of scientific papers seems an ongoing trend that is greatly motivated by the availability of data in online academic repositories. Following the examples of [59] and [60], we took the initiative to produce an even larger collection of scientific paper keywords, titles and abstracts. Exploiting the whole data of Open Academic Graph (described in [61] and [62]), we retrieved keywords, title and abstract data wherever they were available.…”
Section: B a Novel And Huge Data Collectionmentioning
confidence: 99%
“…It is useful to define a metric that measures accuracy loss per utterance, relative to the no-pruning baseline. We report relative data-score efficiency, originally proposed by Çano and Bojar (2019):…”
Section: Experiments Metricsmentioning
confidence: 99%
“…In Çano and Bojar (2019), the relative data-score efficiency metric σ was used to evaluate how well model accuracy scaled as the amount of training data was increased. We use use σ in a slightly different but analogous manner to quantify how well model errors are avoided as the training data is pruned using a given sampling technique.…”
mentioning
confidence: 99%