2020
DOI: 10.48550/arxiv.2008.03439
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

More Effective Software Repository Mining

Adam Tutko,
Austin Henley,
Audris Mockus

Abstract: Background: Data mining and analyzing of public Git so ware repositories is a growing research field. e tools used for studies that investigate a single project or a group of projects have been refined, but it is not clear whether the results obtained on such "convenience samples" generalize. Aims: is paper aims to elucidate the difficulties faced by researchers who would like to ascertain the generalizability of their findings by introducing an interface that addresses the issues with obtaining representative… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 7 publications
0
1
0
Order By: Relevance
“…Unfortunately, git commit data does not disambiguate usernames. Past work [16,42] has attempted to disambiguate authors based on a combination of their commit names and commit email addresses, but we considered this out of scope for our work. By not applying identity disambiguation to either the Penumbra or GitHub repositories, the use of emails-as-proxy is consistent across both samples.…”
Section: Repository Analysismentioning
confidence: 99%
“…Unfortunately, git commit data does not disambiguate usernames. Past work [16,42] has attempted to disambiguate authors based on a combination of their commit names and commit email addresses, but we considered this out of scope for our work. By not applying identity disambiguation to either the Penumbra or GitHub repositories, the use of emails-as-proxy is consistent across both samples.…”
Section: Repository Analysismentioning
confidence: 99%