The Evaluator Effect in Usability Studies: Problem Detection and Severity Judgments

Jacobsen, N. Kingo; Hertzum, Morten; John, Bonnie E.

doi:10.1177/154193129804201902

“…Specifically, the evaluator effect is neither restricted to novice evaluators nor to evaluators knowledgeable of usability in general. The evaluator effect was also found for evaluators with experience in the specific UEM they have been using (Jacobsen et al, 1998;Lewis et al, 1990;Molich et al, 1998Molich et al, , 1999. Furthermore, the evaluator effect is not affected much by restricting the set of problems to only the severe problems.…”

Section: Studies Of the Evaluator Effect In Cw He And Tamentioning

confidence: 58%

“…Hence, the evaluators were requested to de-tect problems according to the nine criteria, and they were asked to report time-stamped evidence and a free-form description for each problem. Based on the evaluators' problem lists, two of the authors of Jacobsen et al (1998) independently constructed a master list of unique problem tokens. They agreed on 86% of the problem tokens, and by discussing their disagreements and the problems they did not share, a consensus was reached.…”

Section: Tamentioning

confidence: 99%

The Evaluator Effect: A Chilling Fact About Usability Evaluation Methods

Hertzum

¹

,

Jacobsen²

2001

International Journal of Human-Computer Interaction

Self Cite

View full text Add to dashboard Cite

Computer professionals have a need for robust, easy-to-use usability evaluation methods (UEMs) to help them systematically improve the usability of computer artifacts. However, cognitive walkthrough (CW), heuristic evaluation (HE), and thinking-aloud study (TA)-3 of the most widely used UEMs-suffer from a substantial evaluator effect in that multiple evaluators evaluating the same interface with the same UEM detect markedly different sets of problems. A review of 11 studies of these 3 UEMs reveals that the evaluator effect exists for both novice and experienced evaluators, for both cosmetic and severe problems, for both problem detection and severity assessment, and for evaluations of both simple and complex systems. The average agreement between any 2 evaluators who have evaluated the same system using the same UEM ranges from 5% to 65%, and no 1 of the 3 UEMs is consistently better than the others. Although evaluator effects of this magnitude may not be surprising for a UEM as informal as HE, it is certainly notable that a substantial evaluator effect persists for evaluators who apply the strict procedure of CW or observe users thinking out loud. Hence, it is highly questionable to use a TA with 1 evaluator as an authoritative statement about what problems an interface contains. Generally, the application of the UEMs is characterized by (a) vague goal analyses leading to variability in the task scenarios, (b) vague evaluation procedures leading to anchoring, or (c) vague problem criteria leading to anything being accepted as a usability problem, or all of these. The simplest way of coping with the evaluator effect, which cannot be completely eliminated, is to involve multiple evaluators in usability evaluations.

show abstract

“…Previous studies (e.g., Hertzum & Jacobsen, 1999;Jacobsen, Hertzum, & John, 1998;Nielsen, 1992) have used the average detection rate of a single evaluator as their basic measure of the evaluator effect. This measure relates the evaluators' individual performances to their collective performance by dividing the average number of problems detected by a single evaluator by the number of problems detected collectively by all the evaluators.…”

Section: Measuring the Evaluator Effectmentioning

confidence: 99%

The Evaluator Effect: A Chilling Fact About Usability Evaluation Methods

Hertzum¹,

Jacobsen²

2003

International Journal of Human-Computer Interaction

Self Cite

View full text Add to dashboard Cite

Computer professionals have a need for robust, easy-to-use usability evaluation methods (UEMs) to help them systematically improve the usability of computer artifacts. However, cognitive walkthrough (CW), heuristic evaluation (HE), and thinking-aloud study (TA)-3 of the most widely used UEMs-suffer from a substantial evaluator effect in that multiple evaluators evaluating the same interface with the same UEM detect markedly different sets of problems. A review of 11 studies of these 3 UEMs reveals that the evaluator effect exists for both novice and experienced evaluators, for both cosmetic and severe problems, for both problem detection and severity assessment, and for evaluations of both simple and complex systems. The average agreement between any 2 evaluators who have evaluated the same system using the same UEM ranges from 5% to 65%, and no 1 of the 3 UEMs is consistently better than the others. Although evaluator effects of this magnitude may not be surprising for a UEM as informal as HE, it is certainly notable that a substantial evaluator effect persists for evaluators who apply the strict procedure of CW or observe users thinking out loud. Hence, it is highly questionable to use a TA with 1 evaluator as an authoritative statement about what problems an interface contains. Generally, the application of the UEMs is characterized by (a) vague goal analyses leading to variability in the task scenarios, (b) vague evaluation procedures leading to anchoring, or (c) vague problem criteria leading to anything being accepted as a usability problem, or all of these. The simplest way of coping with the evaluator effect, which cannot be completely eliminated, is to involve multiple evaluators in usability evaluations.

show abstract

“…Evaluators may have difficulty in distinguishing between a problem that is rated as a Major Usability Problem and a Usability Catastrophe when using playability heuristics. When comparing evaluator's classification of problems to severity ratings research has shown that inter-rater reliability tends to be low [19]. This may be due to the difficulty evaluators have in distinguishing the boundaries between scales.…”

Section: Introductionmentioning

confidence: 97%

Evidence Based Design of Heuristics for Computer Assisted Assessment

Sim

¹

,

Read

²

,

Cockton

³

2009

Human-Computer Interaction – INTERACT 2009

View full text Add to dashboard Cite

Abstract. The use of heuristics for the evaluation of interfaces is a well studied area. Currently there appear to be two main research areas in relation to heuristics: the analysis of methods to improve the effectiveness of heuristic evaluations; and the development of new heuristic sets for novel and specialised domains. This paper proposes an evidence based design approach to the development of domain specific heuristics and shows how this method was applied within the context of computer assisted assessment. A corpus of usability problems was created through a series of student surveys, heuristic evaluations, and a review of the literature. This corpus was then used to synthesise a set of domain specific heuristics for evaluating CAA applications. The paper describes the process, and presents a new set of heuristics for evaluating CAA applications.

show abstract

The Evaluator Effect in Usability Studies: Problem Detection and Severity Judgments

Cited by 48 publications

References 17 publications

The Evaluator Effect: A Chilling Fact About Usability Evaluation Methods

The Evaluator Effect: A Chilling Fact About Usability Evaluation Methods

The Evaluator Effect: A Chilling Fact About Usability Evaluation Methods

Evidence Based Design of Heuristics for Computer Assisted Assessment

Contact Info

Product

Resources

About