Abstract:Big data repositories from online learning platforms such as Massive Open Online Courses (MOOCs) represent an unprecedented opportunity to advance research on education at scale and impact a global population of learners. To date, such research has been hindered by poor reproducibility and a lack of replication, largely due to three types of barriers: experimental, inferential, and data. We present a novel system for large-scale computational research, the MOOC Replication Framework (MORF), to jointly address … Show more
“…Of course, as discussed in section 4.1, our recommendation to collect richer identity data does have the drawback of increasing risk around privacy and increased regulatory challenges. Approaches such as data obfuscation (Bakken et al, 2004), providing researchers the ability to use but not view variables (Gardner et al, 2018), legal agreements around data re-identification (ASSISTments Project, 2014), can mitigate these risks to a degree. Encouraging regulators and institutional review boards (or other privacy officers) to balance the risks of privacy violations with the risks of algorithmic bias will also be highly important.…”
In this paper, we review algorithmic bias in education, discussing the causes of that bias and reviewing the empirical literature on the specific ways that algorithmic bias is known to have manifested in education. While other recent work has reviewed mathematical definitions of fairness and expanded algorithmic approaches to reducing bias, our review focuses instead on solidifying the current understanding of the concrete impacts of algorithmic bias in educationwhich groups are known to be impacted and which stages and agents in the development and deployment of educational algorithms are implicated. We discuss theoretical and formal perspectives on algorithmic bias, connect those perspectives to the machine learning pipeline, and review metrics for assessing bias. Next, we review the evidence around algorithmic bias in education, beginning with the most heavily-studied categories of race/ethnicity, gender, and nationality, and moving to the available evidence of bias for less-studied categories, such as socioeconomic status, disability, and military-connected status. Acknowledging the gaps in what has been studied, we propose a framework for moving from unknown bias to known bias and from fairness to equity. We discuss obstacles to addressing these challenges and propose four areas of effort for mitigating and resolving the problems of algorithmic bias in AIED systems and other educational technology.
“…Of course, as discussed in section 4.1, our recommendation to collect richer identity data does have the drawback of increasing risk around privacy and increased regulatory challenges. Approaches such as data obfuscation (Bakken et al, 2004), providing researchers the ability to use but not view variables (Gardner et al, 2018), legal agreements around data re-identification (ASSISTments Project, 2014), can mitigate these risks to a degree. Encouraging regulators and institutional review boards (or other privacy officers) to balance the risks of privacy violations with the risks of algorithmic bias will also be highly important.…”
In this paper, we review algorithmic bias in education, discussing the causes of that bias and reviewing the empirical literature on the specific ways that algorithmic bias is known to have manifested in education. While other recent work has reviewed mathematical definitions of fairness and expanded algorithmic approaches to reducing bias, our review focuses instead on solidifying the current understanding of the concrete impacts of algorithmic bias in educationwhich groups are known to be impacted and which stages and agents in the development and deployment of educational algorithms are implicated. We discuss theoretical and formal perspectives on algorithmic bias, connect those perspectives to the machine learning pipeline, and review metrics for assessing bias. Next, we review the evidence around algorithmic bias in education, beginning with the most heavily-studied categories of race/ethnicity, gender, and nationality, and moving to the available evidence of bias for less-studied categories, such as socioeconomic status, disability, and military-connected status. Acknowledging the gaps in what has been studied, we propose a framework for moving from unknown bias to known bias and from fairness to equity. We discuss obstacles to addressing these challenges and propose four areas of effort for mitigating and resolving the problems of algorithmic bias in AIED systems and other educational technology.
“…Benitez & Malin, 2010); a classroom may only have one female indigenous student, and therefore reporting this information creates a serious privacy risk. There are methods that can be used to reduce this risk, such as data obfuscation (Bakken et al, 2004), providing researchers the ability to use but not view variables (Gardner et al, 2018), legal agreements around data reidentification (ASSISTments Project, 2014), but the risk is hard to entirely eliminate in a small data set.…”
Section: What Obstacles Stand In the Way Of Research Into Unknown Algmentioning
Draft Preprint. In this paper, we review algorithmic bias in education, discussing the causes of that bias and reviewing the empirical literature on the specific ways that algorithmic bias is known to have manifested in education. While other recent work has reviewed mathematical definitions of fairness and expanded algorithmic approaches to reducing bias, our review focuses instead on solidifying the current understanding of the concrete impacts of algorithmic bias in education—which groups are known to be impacted and which stages and agents in the development and deployment of educational algorithms are implicated. We discuss theoretical and formal perspectives on algorithmic bias, connect those perspectives to the machine learning pipeline, and review metrics for assessing bias. Next, we review the evidence around algorithmic bias in education, beginning with the most heavily-studied categories of race/ethnicity, gender, and nationality, and moving to the available evidence of bias for less-studied categories, such as socioeconomic status, disability, and military-connected status. Acknowledging the gaps in what has been studied, we propose a framework for moving from unknown bias to known bias and from fairness to equity. We discuss obstacles to addressing these challenges and propose four areas of effort for mitigating and resolving the problems of algorithmic bias in AIED systems and other educational technology.
“…Much of MOOC research over the past five years has been conducted in studies of single higher education institutions, and even the largest studies have aggregated data from within a single MOOC provider. One of the most promising initiatives in this area is MORF framework [10], a platform that enables institutions to securely deposit their MOOC data and allows researchers to execute Docker containers for data analysis while maintaining full privacy of the data. In this study, we propose a methodology that we denote as multiplatform MOOC analytics, which leverages commonalities across MOOC learning and content management systems to allow research teams to create common data formats, agree upon analytic methods, and then generate aggregate data, produced through identical processes, that can allow for "apples-to-apples" comparisons between different MOOC platforms.…”
While global massive open online course (MOOC) providers such as edX, Coursera, and FutureLearn have garnered the bulk of attention from researchers and the popular press, MOOCs are also provisioned by a series of regional providers, who are often using the Open edX platform. We leverage the data infrastructure shared by the main edX instance and one regional Open edX provider, Edraak in Jordan, to compare the experience of learners from Arab countries on both platforms. Comparing learners from Arab countries on edX to those on Edraak, the Edraak population has a more even gender balance, more learners with lower education levels, greater participation from more developing countries, higher levels of persistence and completion, and a larger total population of learners. This "apples to apples" comparison of MOOC learners is facilitated by an approach to multiplatform MOOC analytics, which employs parallel research processes to create joint aggregate datasets without sharing identifiable data across institutions. Our findings suggest that greater research attention should be paid towards regional MOOC providers, and regional providers may have an important role to play in expanding access to higher education.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.