Our system is currently under heavy load due to increased usage. We're actively working on upgrades to improve performance. Thank you for your patience.
2020
DOI: 10.17776/csj.728932
|View full text |Cite
|
Sign up to set email alerts
|

Summarising big data: public GitHub dataset for software engineering challenges

Abstract: In open-source software development environments; textual, numerical, and relationshipbased data generated are of interest to researchers. Various data sets are available for this data, which is frequently used in areas such as software engineering and natural language processing. However, since these data sets contain all the data in the environment, the problem arises in the terabytes of data processing. For this reason, almost all of the studies using GitHub data use filtered data according to certain crite… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2021
2021
2021
2021

Publication Types

Select...
1

Relationship

1
0

Authors

Journals

citations
Cited by 1 publication
(2 citation statements)
references
References 9 publications
0
2
0
Order By: Relevance
“…The Repos and all other collections include records that related to the Users collection. All details regarding the creation of the dataset are provided in the source study of the dataset [35].…”
Section: Paper Id Number Of User Number Of Project Ratio (~)mentioning
confidence: 99%
See 1 more Smart Citation
“…The Repos and all other collections include records that related to the Users collection. All details regarding the creation of the dataset are provided in the source study of the dataset [35].…”
Section: Paper Id Number Of User Number Of Project Ratio (~)mentioning
confidence: 99%
“…Thus, the results of the related studies are controversial in terms of real platform data (because of working on a smaller dataset). Therefore, in this study, we used a public dataset called GitDataSCP (https://github.com/kadirseker00/GitDataSCP) that is reflective of the sparsity problem inherent in the nature of GitHub [35]. Table 2.…”
Section: Introductionmentioning
confidence: 99%