Abstract:This study develops a framework to conceptualize the use and evolution of machine learning (ML) in science assessment. We systematically reviewed 47 studies that applied ML in science assessment and classified them into five categories: (a) constructed response, (b) essay, (c) simulation, (d) educational game, and (e) inter‐discipline. We compared the ML‐based and conventional science assessments and extracted 12 critical characteristics to map three variables in a three‐dimensional framework: construct, funct… Show more
“…Potential risk of misrepresenting the construct of interest. In their study, Zhai et al (2020b) suggest that most ML-based NGSAs target complex and structural constructs of science learning. Complexity, according to Bloom's taxonomy (Forehand, 2010), denotes the rank of cognitive demands for science learning goals.…”
Section: Cognitive Validity: Targeting the Three-dimensionality Of Science Learningmentioning
confidence: 99%
“…However, it will be critical to have evidence that the algorithmic models developed based on one set of responses are applicable to another set of responses. Researchers can select using self-, split-, or cross-validation approaches based on research purpose, but the cross-validation approach was found to be most frequently used (Zhai et al 2020b). Cross-validation requires partitioning the data into n groups and then using (n−1) groups to train the machine while testing the algorithmic model using the remaining data.…”
Section: A Validity Inferential Network For Machine Learning-based Science Assessmentsmentioning
confidence: 99%
“…Although ML has great potential to revolutionize the inferences made from assessment observations toward interpretation and use of the assessment results for instructional purposes, concerns that are associated with test validity arise. These concerns are due to the fact that ML-based NGSAs involve additional steps (e.g., machine training) and factors (e.g., algorithms) that might substantially confound the interpretation, inference, and conclusion made based on machine-generated scores (Zhai et al 2020b). Though prior studies made an effort to provide validity guidelines for automatic scoring (Clauser et al 2002;Williamson et al 2012), no studies have provided a solid validity framework to guide the ML-based assessments in science education that aligned with the performance expectations such as those in the NGSS (2013).…”
This study provides a solid validity inferential network to guide the development, interpretation, and use of machine learningbased next-generation science assessments (NGSAs). Given that machine learning (ML) has been broadly implemented in the automatic scoring of constructed responses, essays, simulations, educational games, and interdisciplinary assessments to advance the evidence collection and inference of student science learning, we contend that additional validity issues arise for science assessments due to the involvement of ML. These emerging validity issues may not be addressed by prior validity frameworks developed for either non-science or non-ML assessments. We thus examine the changes brought in by ML to science assessments and identify seven critical validity issues of ML-based NGSAs: potential risk of misrepresenting the construct of interest, potential confounders due to that more variables may involve, nonalignment between interpretation and use of scores and designed learning goals, nonalignment between interpretation and use of scores and actual learning quality, nonalignment between machine scores and rubrics, limited generalizable ability of machine algorithmic models, and limited extrapolating ability of machine algorithmic models. Based on the seven validity issues identified, we propose a validity inferential network to address the cognitive, instructional, and inferential validity of ML-based NGSAs. To demonstrate the utility of this network, we present an exemplar of ML-based next-generation science assessments that was developed using a seven-step ML framework. We articulate how we used the validity inferential network to ensure accountable assessment design, as well as valid interpretation and use of machine scores.
“…Potential risk of misrepresenting the construct of interest. In their study, Zhai et al (2020b) suggest that most ML-based NGSAs target complex and structural constructs of science learning. Complexity, according to Bloom's taxonomy (Forehand, 2010), denotes the rank of cognitive demands for science learning goals.…”
Section: Cognitive Validity: Targeting the Three-dimensionality Of Science Learningmentioning
confidence: 99%
“…However, it will be critical to have evidence that the algorithmic models developed based on one set of responses are applicable to another set of responses. Researchers can select using self-, split-, or cross-validation approaches based on research purpose, but the cross-validation approach was found to be most frequently used (Zhai et al 2020b). Cross-validation requires partitioning the data into n groups and then using (n−1) groups to train the machine while testing the algorithmic model using the remaining data.…”
Section: A Validity Inferential Network For Machine Learning-based Science Assessmentsmentioning
confidence: 99%
“…Although ML has great potential to revolutionize the inferences made from assessment observations toward interpretation and use of the assessment results for instructional purposes, concerns that are associated with test validity arise. These concerns are due to the fact that ML-based NGSAs involve additional steps (e.g., machine training) and factors (e.g., algorithms) that might substantially confound the interpretation, inference, and conclusion made based on machine-generated scores (Zhai et al 2020b). Though prior studies made an effort to provide validity guidelines for automatic scoring (Clauser et al 2002;Williamson et al 2012), no studies have provided a solid validity framework to guide the ML-based assessments in science education that aligned with the performance expectations such as those in the NGSS (2013).…”
This study provides a solid validity inferential network to guide the development, interpretation, and use of machine learningbased next-generation science assessments (NGSAs). Given that machine learning (ML) has been broadly implemented in the automatic scoring of constructed responses, essays, simulations, educational games, and interdisciplinary assessments to advance the evidence collection and inference of student science learning, we contend that additional validity issues arise for science assessments due to the involvement of ML. These emerging validity issues may not be addressed by prior validity frameworks developed for either non-science or non-ML assessments. We thus examine the changes brought in by ML to science assessments and identify seven critical validity issues of ML-based NGSAs: potential risk of misrepresenting the construct of interest, potential confounders due to that more variables may involve, nonalignment between interpretation and use of scores and designed learning goals, nonalignment between interpretation and use of scores and actual learning quality, nonalignment between machine scores and rubrics, limited generalizable ability of machine algorithmic models, and limited extrapolating ability of machine algorithmic models. Based on the seven validity issues identified, we propose a validity inferential network to address the cognitive, instructional, and inferential validity of ML-based NGSAs. To demonstrate the utility of this network, we present an exemplar of ML-based next-generation science assessments that was developed using a seven-step ML framework. We articulate how we used the validity inferential network to ensure accountable assessment design, as well as valid interpretation and use of machine scores.
“…Unfortunately, CR is both time and resource consuming to score compared with multiple-choice items, and thus, teachers may not be willing to implement CR items in their classrooms. Approaches that employ machine learning have shown great potential in automatically scoring CR assessments (Zhai et al, 2020a). As indicated in a recent review study (Zhai et al, 2020c), machine learning has been adopted in many science assessment practices using CRs, essays, educational games, and interdisciplinary assessments (e.g., Lee et al, 2019a;Nehm et al, 2012).…”
mentioning
confidence: 99%
“…While the potential of machine learning has been recognized, few studies have tackled the true challenge of scoring CR items on multi-dimensional science assessments. There are relatively few studies applying machine learning to analyze assessment items in which students perform tasks that require the use of multiple dimensions of scientific knowledge to make sense of phenomena (Zhai et al, 2020a). In addition, none of the studies explicitly document whether and how these assessments measure the dimensionalities of science learning.…”
In response to the call for promoting three-dimensional science learning (NRC, 2012), researchers argue for developing assessment items that go beyond rote memorization tasks to ones that require deeper understanding and the use of reasoning that can improve science literacy. Such assessment items are usually performance-based constructed responses and need technology involvement to ease the burden of scoring placed on teachers. This study responds to this call by examining the use and accuracy of a machine learning text analysis protocol as an alternative to human scoring of constructed response items. The items we employed represent multiple dimensions of science learning as articulated in the 2012 NRC report. Using a sample of over 26,000 constructed responses taken by 6700 students in chemistry and physics, we trained human raters and compiled a robust training set to develop machine algorithmic models and cross-validate the machine scores. Results show that human raters yielded good (Cohen's k = .40-.75) to excellent (Cohen's k > .75) interrater reliability on the assessment items with varied numbers of dimensions. A comparison reveals that the machine scoring algorithms achieved comparable scoring accuracy to human raters on these same items. Results also show that responses with formal vocabulary (e.g., velocity) were likely to yield lower machine-human agreements, which may be associated with the fact that fewer students employed formal phrases compared with the informal alternatives.
The COVID‐19 global pandemic was a socio‐scientific issue (SSI) that had an impact on various aspects of life including education. Educational institutions adapted to new learning, teaching and assessment approaches to be effective in responding to the pandemic. This study aims to determine the research trends and contributions of science education during the COVID‐19 pandemic in order to follow up on possible impacts and other crises in the future. The study involved a narrative systematic literature review of 898 articles published in three selected journals from 2018 to 2021. The analysis was divided into two stages. First, to compare research trends between 2018 and 2019 as the baseline with research trends for 2020–2021 during COVID‐19. Second, to systematically analyse the content of articles published between 2020 and 2021 to explore the contribution of science education amidst COVID‐19 descriptively. The results show that the empirical type of research during the COVID‐19 pandemic has increased compared to the baseline. Research topics on learning contexts dominate the baseline and amidst the pandemic, but ‘teaching’ topics are current and future trends in science education research. The three selected journals contributed many publications related to understanding and resolving the crisis during the COVID‐19 pandemic directly and indirectly. In addition, science education amidst COVID‐19 contributes to preparing the younger generation to become resilient citizens capable of dealing with crises. Direct evidence of preparing resilient citizens amidst the COVID‐19 pandemic is contributed by technological and pedagogical knowledge, content and context knowledge, futurising education, and student mobility programmes in science education. Furthermore, indirect evidence is contributed by science education publications published in the three selected journals between 2020 and 2021. Most publications are carried out at the high school level. More articles in the integrated sciences are published than in separate disciplines such as physics, chemistry, biology and earth/space science. Furthermore, the details of research trends and contributions of science education amidst the COVID‐19 pandemic are discussed.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.