Abstract:The promotion of competency of nurses and other health-care professionals is a goal shared by many stakeholders. In nursing, observation-based assessments are often better suited than paper-and-pencil tests for assessing many clinical abilities. Unfortunately, few instruments for simulation-based assessment of competency have been published that have undergone stringent reliability and validity evaluation. Reliability analyses typically involve some measure of rater agreement, but other sources of measurement … Show more
“…67 Studies in veterinary and nursing education have reported successfully using fewer than 20 persons in conjunction with larger number of conditions per facet. 52,68 The small sample size may have contributed to the observed low variance in student scores as assessed by the checklist. If the surgical skills examination on a model, scored using the checklist, is to be used as a high-stakes assessment, particularly at other institutions, further validity evidence and additional reliability data should be gathered to maintain a solid validity argument for its use.…”
Section: Discussionmentioning
confidence: 99%
“…While specific guidelines on minimal sample size for generalizability studies have not been established, a minimum of 20 persons for a 1 facet design has been suggested 67 . Studies in veterinary and nursing education have reported successfully using fewer than 20 persons in conjunction with larger number of conditions per facet 52,68 . The small sample size may have contributed to the observed low variance in student scores as assessed by the checklist.…”
To gather and evaluate validity evidence in the form of content and reliability of scores produced by 2 surgical skills assessment instruments, 1) a checklist, and 2) a modified form of the Objective Structured Assessment of Technical Skills (OSATS) global rating scale (GRS).Study design: Prospective randomized blinded study. Sample population: Veterinary surgical skills educators (n =10) evaluated content validity. Scores from students in their third preclinical year of veterinary school (n = 16) were used to assess reliability.Methods: Content validity was assessed using Lawshe's method to calculate the Content Validity Index (CVI) for the checklist and modified OSATS GRS.The importance and relevance of each item was determined in relation to skills needed to successfully perform supervised surgical procedures. The reliability of scores produced by both instruments was determined using generalizability (G) theory. Results: Based on the results of the content validation, 39 of 40 checklist items were included. The 39-item checklist CVI was 0.81. One of the 6 OSATS GRS items was included. The 1-item GRS CVI was 0.80. The G-coefficients for the 40-item checklist and 6-item GRS were 0.85 and 0.79, respectively.
Conclusion:Content validity was very good for the 39-item checklist and good for the 1-item OSATS GRS. The reliability of scores from both instruments was acceptable for a moderate stakes examination. Impact: These results provide evidence to support the use of the checklist described and a modified 1-item OSAT GRS in moderate stakes examinations when evaluating preclinical third-year veterinary students' technical surgical skills on low-fidelity models.
“…67 Studies in veterinary and nursing education have reported successfully using fewer than 20 persons in conjunction with larger number of conditions per facet. 52,68 The small sample size may have contributed to the observed low variance in student scores as assessed by the checklist. If the surgical skills examination on a model, scored using the checklist, is to be used as a high-stakes assessment, particularly at other institutions, further validity evidence and additional reliability data should be gathered to maintain a solid validity argument for its use.…”
Section: Discussionmentioning
confidence: 99%
“…While specific guidelines on minimal sample size for generalizability studies have not been established, a minimum of 20 persons for a 1 facet design has been suggested 67 . Studies in veterinary and nursing education have reported successfully using fewer than 20 persons in conjunction with larger number of conditions per facet 52,68 . The small sample size may have contributed to the observed low variance in student scores as assessed by the checklist.…”
To gather and evaluate validity evidence in the form of content and reliability of scores produced by 2 surgical skills assessment instruments, 1) a checklist, and 2) a modified form of the Objective Structured Assessment of Technical Skills (OSATS) global rating scale (GRS).Study design: Prospective randomized blinded study. Sample population: Veterinary surgical skills educators (n =10) evaluated content validity. Scores from students in their third preclinical year of veterinary school (n = 16) were used to assess reliability.Methods: Content validity was assessed using Lawshe's method to calculate the Content Validity Index (CVI) for the checklist and modified OSATS GRS.The importance and relevance of each item was determined in relation to skills needed to successfully perform supervised surgical procedures. The reliability of scores produced by both instruments was determined using generalizability (G) theory. Results: Based on the results of the content validation, 39 of 40 checklist items were included. The 39-item checklist CVI was 0.81. One of the 6 OSATS GRS items was included. The 1-item GRS CVI was 0.80. The G-coefficients for the 40-item checklist and 6-item GRS were 0.85 and 0.79, respectively.
Conclusion:Content validity was very good for the 39-item checklist and good for the 1-item OSATS GRS. The reliability of scores from both instruments was acceptable for a moderate stakes examination. Impact: These results provide evidence to support the use of the checklist described and a modified 1-item OSAT GRS in moderate stakes examinations when evaluating preclinical third-year veterinary students' technical surgical skills on low-fidelity models.
“…G Theory simultaneously facilitates the identification of multiple sources of variance and quantification of each individual variance in a measurement (Bloch & Norman, 2012;Briesch, Swaminathan, Welsh, & Chafouleas, 2014). Based on a variance component analysis (Brennan, 2010), it differentiates the sources of variance and estimates the magnitude of each variance (O'Brien, Thompson, & Hagler, 2017).…”
Section: Generalisability Theorymentioning
confidence: 99%
“…biases and poorly-constructed marking criteria (Prion et al, 2016). The major limitation of using CTT in reliability analyses is that it focuses on only a single source of error variance at any one time (Bloch & Norman, 2012), as it does not differentiate sources of error variance (O'Brien et al, 2017).…”
Section: An Overview Of Generalisability Theory (G Theory)mentioning
confidence: 99%
“…One way of quantifying different sources of variance is to estimate the variance components of each facet (e.g., students, examiners, and stations), that is, the amount of variance associated with each facet or interaction between facets (Shavelson & Webb, 1991). Based on a variance component analysis (Brennan, 2010), G Theory differentiates the sources of variance and estimates the magnitude of each source of variance (O'Brien et al, 2017). It explores the similarity of the examiners' scores to any other raw scores that the examiners might give under identical circumstances and develops a reliability analysis, as well as monitoring the overall quality of an OSCE (Iramaneerat, Yudkowsky, Myford, & Downing, 2008).…”
Section: An Overview Of Generalisability Theory (G Theory)mentioning
Higher education institutions globally face decreasing government funding and heightened accountability. There is an increasing demand for them to justify the use of public funding for teaching in universities, and provide information for national and international comparisons of quality. High-stakes summative student assessment data have been, and continue to be, critical for fulfilling both accountability and quality requirements. However, concerns have been raised about the dominance of using summative assessment for multiple purposes beyond certifying student achievement. For example, this dominance marginalises the importance of formative assessment in providing feedback to guide students' future learning. Such concerns have driven assessment reforms. One of these reforms-competency-based assessment (CBA)-is promoted as an alternative to high-stakes examinations, as it may achieve both summative and formative purposes. In the context of medical education, an example of CBA is the Objective Structured Clinical Examination (OSCE) which has been implemented globally for both summative and formative purposes. The OSCE was originally designed for a small cohort of students (around 100) to undertake highly structured and discrete clinical tasks in a series of timed stations. A major challenge of the OSCE relates to achieving consistency of examiner judgements of student performance as the assessment format relies on multiple examiners. In this PhD study, the OSCE under investigation was a high-stakes exit examination for large cohorts of final-year students (over 350) enrolled in a graduate entry four-year Bachelor of Medicine/Bachelor of Surgery (MBBS) program at one Australian research-intensive university. A question arises as to whether examiners in this context can deliver consistent and reliable judgements. This is the issue of concern addressed in this thesis. The overarching purpose of this study was to provide new insights into the consistency of examiner judgements in this high-stakes assessment, and explore the possible impact of structured feedback on changing examiner marking behaviour. The four specific aims were to: develop a deeper understanding of the associated factors that influence examiner judgements of medical students' performance in clinical examinations (OSCEs); evaluate the impacts of providing examiners with structured feedback on their subsequent judgement behaviours; explore the factors that impact on the effectiveness of structured feedback in changing examiner marking behaviour; and, to explore the examiners' proposed training strategies that may assist in increasing the consistency of their judgements in OSCEs. A mixed-methods case study approach was adopted to collect both quantitative and qualitative data. Quantitative data included the examiners' scores awarded to the students in the Year 1 and 2 OSCEs. After completing the OSCE in Year 1, the examiners received a structured Consistency of Examiner Judgements in Medical Education 3 feedback report about their marking behaviour pr...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.