Proceedings of the 28th ACM International Conference on Multimedia 2020
DOI: 10.1145/3394171.3413749
|View full text |Cite
|
Sign up to set email alerts
|

Describing Subjective Experiment Consistency by p-Value P--P Plot

Abstract: There are phenomena that cannot be measured without subjective testing. However, subjective testing is a complex issue with many influencing factors. These interplay to yield either precise or incorrect results. Researchers require a tool to classify results of subjective experiment as either consistent or inconsistent. This is necessary in order to decide whether to treat the gathered scores as quality ground truth data. Knowing if subjective scores can be trusted is key to drawing valid conclusions and build… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
7
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
4
3

Relationship

3
4

Authors

Journals

citations
Cited by 10 publications
(15 citation statements)
references
References 36 publications
0
7
0
Order By: Relevance
“…It is worth mentioning that we already proposed in the past a tool based on the GSD class. The tool extends possible ways to validate data consistency of responses obtained during a MQA subjective experiment [8]. Importantly, our work was noticed by practitioners in the MQA field and referred to in [9], [10], and [11].…”
Section: Introductionmentioning
confidence: 78%
See 1 more Smart Citation
“…It is worth mentioning that we already proposed in the past a tool based on the GSD class. The tool extends possible ways to validate data consistency of responses obtained during a MQA subjective experiment [8]. Importantly, our work was noticed by practitioners in the MQA field and referred to in [9], [10], and [11].…”
Section: Introductionmentioning
confidence: 78%
“…The black line is the upper bound of 95% right-sided confidence interval for the CDF of p-values under the null hypothesis. Specifically, under the null hypothesis, the CDF of p-values is not greater than the uniform distribution function (for more details see [8]). As one can see, there is no evidence that the GSD is not the correct way of modelling subjective responses from MQA experiments.…”
Section: A Comparing Goodness-of-fit Of Ordered Probit and Gsd For Mu...mentioning
confidence: 99%
“…Fig. 2 from [1]). Effectively, recreating these results is the most significant part of the reproducibility efforts.…”
Section: $ P Y T H O N 3 R E P R O D U C E Py −Hmentioning
confidence: 96%
“…Another two important files in the repo are: (i) subjective_qua-lity_datasets.csv and (ii) G_test_results.csv. The former one includes raw subjective data that is processed in the original paper [1]. The most important output of this processing is the G_test_results.csv file.…”
Section: $ P Y T H O N 3 R E P R O D U C E Py −Hmentioning
confidence: 99%
“…Modelling the individual listener score will allow for the model to be able to take into account this rater data, accounting for the rater bias. Additionally, many researchers have pointed out issues with using MOS as the primary quality metric and have proposed alternatives [20]- [22]. Modeling the individual rater score allows using the model for other metrics that are alternatives or complements to MOS.…”
Section: Introductionmentioning
confidence: 99%