Proceedings of the 11th Working Conference on Mining Software Repositories 2014
DOI: 10.1145/2597073.2597074
|View full text |Cite
|
Sign up to set email alerts
|

The promises and perils of mining GitHub

Abstract: With over 10 million git repositories, GitHub is becoming one of the most important source of software artifacts on the Internet. Researchers are starting to mine the information stored in GitHub's event logs, trying to understand how its users employ the site to collaborate on software. However, so far there have been no studies describing the quality and properties of the data available from GitHub. We document the results of an empirical study aimed at understanding the characteristics of the repositories i… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

3
390
2
5

Year Published

2015
2015
2022
2022

Publication Types

Select...
5
1
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 599 publications
(400 citation statements)
references
References 25 publications
3
390
2
5
Order By: Relevance
“…The use of projects from GitHub leads to a number of threats, as documented by Kalliamvakou et al (2014). In our data collection, we tried to mitigate these biases (e.g., we only selected active projects), but some limitations are still present.…”
Section: Threats To Validitymentioning
confidence: 99%
“…The use of projects from GitHub leads to a number of threats, as documented by Kalliamvakou et al (2014). In our data collection, we tried to mitigate these biases (e.g., we only selected active projects), but some limitations are still present.…”
Section: Threats To Validitymentioning
confidence: 99%
“…Inferring User Name And Country The simple task of retrieving the first name of a GitHub user is challenging due to the noise and lack of information in the dataset [16]. Approximately 20% of all users were labeled as "unknown" in the GHTorrent dump analysed.…”
Section: Measuring the Control Variablesmentioning
confidence: 99%
“…Kalliamvakou et al mined GitHub repositories to investigate their characteristics and their qualities [10]. They presented a detailed study discussing different project characteristics, such as (in)activity.…”
Section: Related Workmentioning
confidence: 99%
“…Previously mentioned approaches use a self-made database for their own purpose as we could seen this advice in the work of Kalliamvakou et al too [10]. Bug prediction techniques and approaches can be presented and compared in different ways; however, there are some basic points that can serve as common components.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation