In the setting of design-based research, the second version of an experimental course on data science is implemented accompanied by research. The three modules of the course focus on “data and data detectives”, “machine learning” and a combination of both in working on a final project. In this paper, we will focus on the topic “decision trees” which is part of “machine learning”. The students learn approaches of how to build decision trees manually from data using the tree plugin of CODAP. Further on, they learn to design and code an algorithm with Python that automatically generates trees. Afterwards, the algorithm is applied to real data sets with the support of Jupyter Notebooks. The instructional approach provides a deep content knowledge, which also serves as a basis for discussing the difference between humans’ and machines’ building decision trees and the societal implications of implementing them in practice.
The disciplinary identity as a computer science student has recently received increasing attention as a well-developed subject identity can help with increasing retention, interest and motivation. Besides, identity theory can serve as an analytical lens for issues around diversity. However, identity is also often perceived as a vague, overused concept with a variety of theories to build upon. In addition, connections to other topics, such as computer science conceptions, remain unclear and there seems to be little intra-disciplinary exchange about the concept. This article therefore attempts to provide a starting point by presenting a so far missing systematic literature review of identity in Computing Education Research (CER). We analyzed a corpus of 41 papers published since 2005 with a focus on the variety of identity theories that are used, the reasons for using them and the overall theoretical framing of the concept in the CER literature up to this point. We use content analysis with both inductive and deductive coding to derive categories from the corpus to answer our research questions. The results show that there is less variety in the theories than originally expected, most publications refer to the theory of “Communities of Practice”. The reasons for employing identity theory are also rather canonical, in particular, there is only little theoretical development of the theories within CER and also only little empirical work. Finally, we also present an extended version of a computing identity that can be theoretically derived from the work in our corpus.
Data Science has become an emerging field at the intersection of statistics, computer science and application fields and this discipline requires “new skills” to be enabled to explore for example large and messy datasets, so-called Big Data. Because of this emerging relevance we started an interdisciplinary project between statistics and computer science education, which is initiated by Deutsche Telekom Stiftung, with the aim to concretize Data Science and its implications for schools. We offer an innovative and interdisciplinary approach on how to implement Data Science in secondary school under the consideration of the need of “new skills in statistics education”. In this paper we will report on an introduction into Data Science at secondary school with the focus on exploring multivariate data with CODAP.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.