Observational studies require matching across groups over multiple confounding variables. Across the literature, matching algorithms fail to handle this issue. In this way, missing values are regularly imputed prior to being considered in the matching process. However, imputing is not always practical, forcing us to drop an observation due to the deficiency of the chosen algorithm, decreasing the power of the study, and possibly failing to capture crucial latent information. We propose a missing data mechanism to incorporate within an iterative multivariate matching method. The underlying framework utilizes random forest, implemented with surrogate splits as a natural tool in constructing a distance matrix where there might be missing values. The out-put is then easily fed into an optimal matching algorithm. We apply this method to evaluate the effectiveness of Supplemental Instruction (SI) sessions, a voluntary program where students seek additional help, in a large enrollment, bottleneck introductory business statistics course. This is an observational study with two groups, those who attend multiple SI sessions and those who do not, and, as typical in educational data mining, challenged by missing data. Additionally, we perform a data simulation on missingness to further demonstrate the efficacy of our proposed approach.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.