Software project estimation is important for allocating resources and planning a reasonable work schedule. Estimation models are typically built using data from completed projects. While organizations have their historical data repositories, it is difficult to obtaintheir collaboration due to privacy and competitive concerns. To overcome the issue of public access to private data repositories this study proposes an algorithm to extract sufficient data from the GitHub repository for building duration estimation models. More specifically, this study extracts and analyses historical data on WordPress projects to estimate OSS project duration using commits as an independent variable as well as an improved classification of contributors based on the number of active days for each contributor within a release period. The results indicate that duration estimation models using data from OSS repositories perform well and partially solves the problem of lack of data encountered in empirical research in software engineering.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.