From source code identifiers to natural language terms

Carvalho, Nuno; Almeida, José João; Henriques, Pedro Rangel; Varanda, Maria Joao

doi:10.1016/j.jss.2014.10.013

Cited by 29 publications

(13 citation statements)

References 28 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…This is an expected result, since the identifiers are the basic element of each language. There are a lot of other works that build source code analysis methods based only on the source code identifiers . The next four elements are also present in most vectors.…”

Section: Experimental Evaluation Of Clustering Efficiencymentioning

confidence: 99%

Searching source code fragments using incremental clustering

Duracik

Kršák

Hrkut

2019

Concurrency and Computation

View full text Add to dashboard Cite

Summary Plagiarism is becoming an increasingly serious problem in academic environment. In this paper, we deal with a specific kind of plagiarism: source code plagiarism. In this case, there is no software available for detecting plagiarism on a larger scale (hundreds of student submissions every year). We propose algorithms for source code parsing and processing as a part of a complex system for plagiarism detection. A source code vectorization using characteristic vectors is a vital piece of the whole process, and k‐means algorithm helps with the classification and clustering of vectors. Student assignments are submitted regularly, and any plagiarism detection system needs to handle them as they come. For this reason, we propose a modified incremental k‐means algorithm and a method for determining the number of clusters. We also consider methods for vector search among clusters and suggest the use of conditional entropy to select the important vector elements used in the search algorithm. Our results show how the proposed algorithms and methods work on real student submissions.

show abstract

Section: Experimental Evaluation Of Clustering Efficiencymentioning

confidence: 99%

Searching source code fragments using incremental clustering

Duracik

Kršák

Hrkut

2019

Concurrency and Computation

View full text Add to dashboard Cite

show abstract

“…First, the identifier is retrieved by using the method retrieveIdentifier(). Once the identifier is available, the split algorithm is applied; in this case samurai() splitter is the used algorithm [11], [22], [12]. Samurai returns a set of words (that compound the identifier under analysis) which are searched in the domain specific dictionary (findInDictionary()) with the goal to verify if it is a valid word or not.…”

Section: Wsdludmentioning

confidence: 99%

Measuring the understandability of WSDL specifications, web service understanding degree approach and system

et al. 2016

View full text Add to dashboard Cite

Web Services (WS) are fundamental software artifacts for building service oriented applications and they are usually reused by others. Therefore they must be analyzed and comprehended for maintenance tasks: identification of critical parts, bug fixing, adaptation and improvement. In this article, WSDLUD a method aimed at measuring a priori the understanding degree (UD) of WSDL (Web Service Description Language) descriptions is presented. In order to compute UD several criteria useful to measure the understanding's complexity of WSDL descriptions must be defined. These criteria are used by LSP (Logic Scoring of Preference), a multicriteria evaluation method, for producing a Global Preference value that indicates the satisfaction level of the WSDL description regarding the evaluation focus, in this case, the understanding degree. All the criteria information required by LSP is extracted from WSDL descriptions by using static analysis techniques and processed by specific algorithms which allow gathering semantic information. This process allows to obtain a priori information about the comprehension difficulty which proves our research hypotheses that states that it is possible to compute the understanding degree of a WSDL description.

show abstract

“…Hard split dilakukan untuk memisahkan identifier berdasarkan karakter tertentu seperti tanda garis bawah (underscore) atau berdasarkan aturan penulisan camel case. Sementara itu, soft split akan memisahkan identifier yang tidak terlalu terlihat (tanda) pemisahnya [9]. Ilustrasi proses pemisahan dan ekspansi ini dapat dilihat pada Gambar 2.…”

Section: Normalisasi Dengan Lingua::idsplitterunclassified

“…Mekanisme pemisahan dan ekspansi identifier [9] [9] Otomata pada Gambar 3 digunakan untuk menghitung skor kandidat hasil pemisahan. Dengan melihat skor kandidat kata pada Tabel 1, didapatkan bahwa kandidat term time dan sort adalah term yang paling baik untuk menjadi hasil pemisahan atas identifier timesort.…”

Section: Normalisasi Dengan Lingua::idsplitterunclassified

“…Sementara programmer lain lebih suka menulis identifier dalam bentuk lengkap dibandingkan dengan bentuk singkatan. Mengatasi hal tersebut, [9] mengembangkan Lingua::IdSplitter. Algoritma tersebut memiliki kemampuan untuk memisah identifier yang umumnya terdiri atas komposisi term serta mengekspansi term singkatan pada identifier menjadi term lengkap, baik yang menggunakan model penulisan all lowercase maupun camel case.…”

unclassified

See 1 more Smart Citation

Perbaikan Metode Rekomendasi Diskusi Pemrograman dengan Normalisasi Identifier Menggunakan Lingua::IdSplitter

Rozi

Siahaan

Baskoro

2016

Media Komunikasi Teknologi

View full text Add to dashboard Cite

ABSTRAKSitus tanya-jawab Stack Overflow telah sering digunakan sebagai acuan oleh programmer. Informasi atau solusi dalam proses pengembangan perangkat lunak dapat dicari dengan bantuan mesin pencari pada situs. Namun, perbedaan dalam gaya penulisan, terutama pada penulisan identifier program, sering menyebabkan rekomendasi (pencarian) menjadi tidak sesuai dengan kebutuhan programmer. Beberapa programmer menulis identifier dalam bentuk singkatan sementara yang lain tidak sehingga menurunkan kinerja rekomendasi. Penelitian ini mengadopsi Lingua::IdSplitter untuk menormalkan identifier pada data diskusi Stack Overflow. Proses normalisasi dilakukan dengan memisahkan identifier yang terdiri atas komposisi term serta memperluas singkatan yang ada pada identifier ke bentuk penuh. Hasil penelitian menunjukkan bahwa normalisasi identifier menggunakan Lingua::IdSplitter secara umum mampu meningkatkan nilai median recall hasil rekomendasi hingga 13%. Kata kunci: sistem rekomendasi, identifier, normalisasi, Stack Overflow PENDAHULUANPerusahaan perangkat lunak yang baik seharusnya dapat menyediakan deliverable dengan waktu yang singkat dan dengan kualitas yang baik pula. Untuk mencapai hal tersebut, dibutuhkan manajemen serta kerja sama yang baik dari setiap pemangku kepentingan terutama programmer yang menjadi tulang punggung terciptanya perangkat lunak. Dengan demikian, performa programmer yang baik merupakan keniscayaan dalam menyukseskan proyek perangkat lunak.Dalam [1] disebutkan bahwa performa programmer dapat diukur berdasarkan kepribadian, kemampuan kognitif, serta tingkat kepercayaan terhadap nilai teoretis (theoretical value belief). Programmer yang selalu mencari pembuktian kebenaran terhadap hasil kerjanya, tidak asal-asalan dalam menyediakan solusi dalam bentuk kode program, cenderung memiliki performa yang baik dalam jangka panjang. Hal tersebut memaksa programmer, terutama programmer pemula, untuk menginvestasikan waktunya dalam meningkatkan kemampuan terkait pemahaman algoritma, cara memprogram, informasi-informasi pendukung, atau hal-hal lain yang berkaitan dengan perangkat lunak yang akan dibangun. Mendapatkan informasi dari anggota tim atau teman kerja menjadi

show abstract

From source code identifiers to natural language terms

Cited by 29 publications

References 28 publications

Searching source code fragments using incremental clustering

Searching source code fragments using incremental clustering

Measuring the understandability of WSDL specifications, web service understanding degree approach and system

Perbaikan Metode Rekomendasi Diskusi Pemrograman dengan Normalisasi Identifier Menggunakan Lingua::IdSplitter

Contact Info

Product

Resources

About