Proceedings of the IEEE/ACM International Conference on Automated Software Engineering 2010
DOI: 10.1145/1858996.1859088
|View full text |Cite
|
Sign up to set email alerts
|

A sentence-matching method for automatic license identification of source code files

Abstract: The reuse of free and open source software (FOSS) components is becoming more prevalent. One of the major challenges in finding the right component is finding one that has a license that is adequate for its intended use. The license of a FOSS component is determined by the licenses of its source code files. In this paper, we describe the challenges of identifying the license under which source code is made available, and propose a sentence-based matching algorithm to automatically do it. We demonstrate the fea… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
89
0

Year Published

2010
2010
2018
2018

Publication Types

Select...
5
2

Relationship

4
3

Authors

Journals

citations
Cited by 76 publications
(90 citation statements)
references
References 18 publications
1
89
0
Order By: Relevance
“…Even though Ninka does a great job at identifying license identification from text files [1], it relies blindly on the textual information declared in the files and in some cases fail to recognize the license of the files. More over in case where the license information is missing, Ninka reports NONE.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…Even though Ninka does a great job at identifying license identification from text files [1], it relies blindly on the textual information declared in the files and in some cases fail to recognize the license of the files. More over in case where the license information is missing, Ninka reports NONE.…”
Section: Discussionmentioning
confidence: 99%
“…It includes copyright information, such as the names of contributors to a source code file, the copyright owner, warranty, and liability statements. Many tools have been proposed to identify License statements in source files or license files (e.g., README files): Ninka [1], FOSSology [9] and OSLC 1 . In the following, we discuss each of these tools in more details.…”
Section: License Identificationmentioning
confidence: 99%
“…To address RQ1, we use the approach described in Section IV to retrieve, from Google Code Search, the licenses of .class files contained in jar archives. At the same time, we take the source code from which the jars were produced, and classify the licenses contained in it using the automatic license identification tool Ninka [18]. Ninka was created to identify the license of source code files only and outperforms FOSSology in both accuracy (reporting the correct license name and version in a source code file) and speed (runs an order of magnitude faster) as illustrated in Table III 12 .…”
Section: Empirical Studymentioning
confidence: 99%
“…The Open Source Initiate 1 (OSI), responsible for the definition of open source, has approved 70 licenses 2 The Software Package Data eXchange 3 (SPDX), a consortium of non-profit and profit organizations that attempt to standardize licensing information across parties lists 306 different licenses 4 .…”
Section: Introductionmentioning
confidence: 99%
“…These licenses comprise a very large portion of open source licensed software; in a study it was found that 9.1% of Debian applications were licensed under BSD or MIT licenses [2]. Furthermore, these licenses, which are known as Academic [6] are of particular interest because they allow unlimited use the software with very few restrictions 5 .…”
Section: Introductionmentioning
confidence: 99%