Background: Accurate representation of developer expertise has always been an important research problem. While a number of studies proposed novel methods of representing expertise within individual projects, these methods are difficult to apply at an ecosystem level. However, with the focus of software development shifting from monolithic to modular, a method of representing developers' expertise in the context of the entire OSS development becomes necessary when, for example, a project tries to find new maintainers and look for developers with relevant skills. Aim: We aim to address this knowledge gap by proposing and constructing the Skill Space where each API, developer, and project is represented and postulate how the topology of this space should reflect what developers know (and projects need). Method: we use the World of Code infrastructure to extract the complete set of APIs in the files changed by open source developers and, based on that data, employ Doc2Vec embeddings for vector representations of APIs, developers, and projects. We then evaluate if these embeddings reflect the postulated topology of the Skill Space by predicting what new APIs/projects developers use/join, and whether or not their pull requests get accepted. We also check how the developers' representations in the Skill Space align with their self-reported API expertise. Result: Our results suggest that the proposed embeddings in the Skill Space appear to satisfy the postulated topology and we hope that such representations may aid in the construction of signals that increase trust (and efficiency) of open source ecosystems at large and may aid investigations of other phenomena related to developer proficiency and learning.
With tens of millions of projects and developers, the OSS ecosystem is both vibrant and intimidating. On one hand, it hosts the source code for the most critical infrastructures and has the most brilliant developers as contributors, while on the other hand, poor quality or even malicious software, and novice developers abound. External contributions are critical to OSS projects, but the chances their contributions are accepted or even considered depend on the trust between maintainers and contributors. Such trust is built over repeated interactions and coding platforms provide "signals" of project or developer quality via measures of activity (commits), and social relationships (followers/stars) to facilitate trust. These signals, however, do not represent the specific expertise of a developer. We, therefore, aim to address this gap by defining the skill space for APIs, developers, and projects that reflects what developers know (and projects need) more precisely than could be obtained via aggregate activity counts, and more generally than pointing to individual files developers have changed in the past. Specifically, we use the World of Code infrastructure to extract the complete set of APIs in the files changed by all open source developers. We use that data to represent APIs, developers, and projects in the skill space, and evaluate if the alignment measures in the skill space can predict whether or not the developers use new APIs, join new projects, or get their pull requests accepted. We also check if the developers' representation in the skill space aligns with their self-reported expertise. Our results suggest that the proposed embedding in the skill space achieves our aims and may serve not only as a signal to increase trust (and efficiency) of open source ecosystems, but may also allow more detailed investigations of other phenomena related to developer proficiency and learning.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.