Proceedings of the 21st International Conference on Evaluation and Assessment in Software Engineering 2017
DOI: 10.1145/3084226.3084287
|View full text |Cite
|
Sign up to set email alerts
|

Cataloging GitHub Repositories

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
32
0

Year Published

2017
2017
2023
2023

Publication Types

Select...
4
3
2

Relationship

1
8

Authors

Journals

citations
Cited by 44 publications
(32 citation statements)
references
References 19 publications
0
32
0
Order By: Relevance
“…such as McMillan et al [1]. Others focus on categorizing software repositories and readme files, which helps to better perceive a massive pile of data and grasp the content faster (Prana et al [5], Sharma and Thung et al [2], . .…”
Section: A Recommendation Of Software Repositoriesmentioning
confidence: 99%
See 1 more Smart Citation
“…such as McMillan et al [1]. Others focus on categorizing software repositories and readme files, which helps to better perceive a massive pile of data and grasp the content faster (Prana et al [5], Sharma and Thung et al [2], . .…”
Section: A Recommendation Of Software Repositoriesmentioning
confidence: 99%
“…GitHub creates showcases where they manually catalog a set of repositories on a certain topic. Sharma et al [2] semiautomatically expanded such showcases. Using 10K repositories with readme files, they first extract the most descriptive section in the readme file by selecting the one with the highest cosine similarity value with the repository short description on the top of the repository landing page on GitHub.…”
Section: B Cataloging Software Repositoriesmentioning
confidence: 99%
“…Thus, we consider only projects that have been starred by at least 20 developers. Such a number of stars has been used in some studies [7,82] as a sign of a decent project. The collected dataset and the CrossSim tool are available online for public usage [68].…”
Section: Crosssim Datasetmentioning
confidence: 99%
“…Based on these heuristics, they build a recommendation system named RepoPal and compare it with state-of-the-art approach CLAN using one thousand repositories on GitHub. Sharma et al collect 10,000 popular projects on GitHub and propose a cataloging system to group similar projects into categories [33]. They automatically extract descriptive segments from readme files and aply LDA-GA, a state-of-the-art topic modeling algorithm that combines Latent Dirichlet Allocation (LDA) and Genetic Algorithm (GA), to identify categories.…”
Section: B Large Scale Studies On Githubmentioning
confidence: 99%