2018
DOI: 10.1016/j.jss.2018.04.018
|View full text |Cite
|
Sign up to set email alerts
|

Lascad : Language-agnostic software categorization and similar application detection

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
12
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
3
2
2

Relationship

0
7

Authors

Journals

citations
Cited by 18 publications
(12 citation statements)
references
References 7 publications
0
12
0
Order By: Relevance
“…Source Code [18,19,9,20,21,22,23,14,24] Other Project Data [25,26,10,27,28,29,30,11] (A) source code; and (B) other project data (e.g., README files), as we are interested in the classification task using semantic information, and structural (can be extracted from source code). Table 1 contains a list of the works divided by their approach.…”
Section: Data Source Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Source Code [18,19,9,20,21,22,23,14,24] Other Project Data [25,26,10,27,28,29,30,11] (A) source code; and (B) other project data (e.g., README files), as we are interested in the classification task using semantic information, and structural (can be extracted from source code). Table 1 contains a list of the works divided by their approach.…”
Section: Data Source Workmentioning
confidence: 99%
“…Another unsupervised approach was adopted by LASCAD [23], a language agnostic classification and similarity tool. As in LACT, the authors used LDA over the source code, and further applied hierarchical clustering with cosine similarity on the output topic terms matrix of LDA to merge similar topics.…”
Section: Source Code Approachesmentioning
confidence: 99%
“…-Information extraction (e.g., VSM) (Nguyen et al 2012;Zhang et al 2018;Chen et al 2020;Thomas et al 2013;Fowkes et al 2016); -Classification (e.g., Support Vector Machine -SVM) (Hindle et al 2013;Le et al 2017;Liu et al 2017;Demissie et al 2020;Zhao et al 2020;Shimagaki et al 2018;Gopalakrishnan et al 2017;Thomas et al 2013); -Clustering (e.g., K-means) (Jiang et al 2019;Cao et al 2017;Liu et al 2017;Zhang et al 2016;Altarawy et al 2018;Demissie et al 2020;Gorla et al 2014); -Structured prediction (e.g., Conditional Random Field -CRF) (Ahasanuzzaman et al 2019); -Artificial neural networks (e.g., Recurrent Neural Network -RNN) (Murali et al 2017;Le et al 2017); -Evolutionary algorithms (e.g., Multi-Objective Evolutionary Algorithm -MOEA) (Blasco et al 2020;Pérez et al 2018); -Web crawling (Nabli et al 2018). Pagano and Maalej (2013) was the only study that contributed an exploration that combined LDA with another text mining technique.…”
Section: Types Of Contributionmentioning
confidence: 99%
“…Zhang et al's RepoPal [45] works to detect similar repositories using key GitHub features (such as similar README files) to help facilitate actions such as source code reuse, explore related repositories, and identify plagiarism. Altarawy et al 's LASCAD [6] automatically categorizes software technologies from source code and finds similar repositories based on the categorizations. Prana et al's multilabel classifier [38] processes textual features of READMEs and classifies README file contents for improved information discovery.…”
Section: Related Workmentioning
confidence: 99%
“…Notable prior contributions to better facilitate information discovery on GitHub have included work on classifying contents of README files [38], identifying similar repositories [6,35,45], and identifying non-engineered repositories from engineered repositories [30]. However, no prior work has focused on repository descriptions as a primary concern for project discovery.…”
Section: Introductionmentioning
confidence: 99%