The challenge in executing a data science project is more than just identifying the best algorithm and tool set to use. Additional sociotechnical challenges include items such as how to define the project goals and how to ensure the project is effectively managed. This paper reports on a set of case studies where researchers were embedded within data science teams and where the researcher observations and analysis was focused on the attributes that can help describe data science projects and the challenges faced by the teams executing these projects, as opposed to the algorithms and technologies that were used to perform the analytics. Based on our case studies, we identified 14 characteristics that can help describe a data science project. We then used these characteristics to create a model that defines two key dimensions of the project. Finally, by clustering the projects within these two dimensions, we identified four types of data science projects, and based on the type of project, we identified some of the sociotechnical challenges that project teams should expect to encounter when executing data science projects.
Part 1: Full PapersInternational audienceWe examine the relationship between communications by core and peripheral members and Free/Libre Open Source Software project success. The study uses data from 74 projects in the Apache Software Foundation Incubator. We conceptualize project success in terms of success building a community, as assessed by graduation from the Incubator. We compare successful and unsuccessful projects on volume of communication by core (committer) and peripheral community members and on use of inclusive pronouns as an indication of efforts to create intimacy among team members. An innovation of the paper is that use of inclusive pronouns is measured using natural language processing techniques. We find that core and peripheral members differ in their volume of contribution and in their use of inclusive pronouns, and that volume of communication is related to project success
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.