Data Science (DS) has emerged from the shadows of its parents-statistics and computer science-into an independent field since its origin nearly six decades ago. Its evolution and education have taken many sharp turns. We present an impressionistic study of the evolution of DS anchored to Kuhn's four stages of paradigm shifts. First, we construct the landscape of DS based on curriculum analysis of the 32 iSchools across the world offering graduate-level DS programs. Second, we paint the "field" as it emerges from the word frequency patterns, ranking, and clustering of course titles based on text mining. Third, we map the curriculum to the landscape of DS and project the same onto the Edison Data Science Framework (2017) and ACM Data Science Knowledge Areas (2021). Our study shows that the DS programs of iSchools align well with the field and correspond to the Knowledge Areas and skillsets. iSchool's DS curriculums exhibit a bias toward "data visualization" along with machine learning, data mining, natural language processing, and artificial intelligence; go light on statistics; slanted toward ontologies and health informatics; and surprisingly minimal thrust toward eScience/research data management, which we believe would add a distinctive iSchool flavor to the DS.
Wikipedia is among the most prominent and comprehensive sources of information available on the WWW. However, its unstructured form impedes direct interpretation by machines. Knowledge Base (KB) creation is a line of research that enables interpretation of Wikipedia's concealed knowledge by machines. In light of the efficacy of KBs for the storage and efficient retrieval of semantic information required for powering several IT applications such Question-Answering System, many large-scale knowledge bases have been developed. These KBs have employed different approaches for data curation and storage. The retrieval mechanism facilitated by these KBs is also different. Further, they differ in their depth and breadth of knowledge. This paper endeavours to explicate the process of KB creation using Wikipedia and compare the prominent KBs developed using the big data of Wikipedia.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.