Péter Gyimesi scite author profile

JavaScript is a popular programming language that is also error-prone due to its asynchronous, dynamic, and loosely-typed nature. In recent years, numerous techniques have been proposed for analyzing and testing JavaScript applications. However, our survey of the literature in this area revealed that the proposed techniques are often evaluated on different datasets of programs and bugs. The lack of a commonly used benchmark limits the ability to perform fair and unbiased comparisons for assessing the efficacy of new techniques. To fill this gap, we propose BUGSJS, a benchmark of 453 real, manually validated JavaScript bugs from 10 popular JavaScript server-side programs, comprising 444k LOC in total. Each bug is accompanied by its bug report, the test cases that detect it, as well as the patch that fixes it. BUGSJS features a rich interface for accessing the faulty and fixed versions of the programs and executing the corresponding test cases, which facilitates conducting highly-reproducible empirical studies and comparisons of JavaScript analysis and testing tools.

show abstract

A Public Bug Database of GitHub Projects and Its Application in Bug Prediction

Tóth

Gyimesi

Ferenć

2016

View full text Add to dashboard Cite

Abstract. Detecting defects in software systems is an evergreen topic, since there is no real world software without bugs. Many different bug locating algorithms have been presented recently that can help to detect hidden and newly occurred bugs in software. Papers trying to predict the faulty source code elements or code segments in the system always use experience from the past. In most of the cases these studies construct a database for their own purposes and do not make the gathered data publicly available. Public datasets are rare; however, a well constructed dataset could serve as a benchmark test input. Furthermore, open-source software development is rapidly increasing that also gives an opportunity to work with public data. In this study we selected 15 Java projects from GitHub to construct a public bug database from. We matched the already known and fixed bugs with the corresponding source code elements (classes and files) and calculated a wide set of product metrics on these elements. After creating the desired bug database, we investigated whether the built database is usable for bug prediction. We used 13 machine learning algorithms to address this research question and finally we achieved F-measure values between 0.7 and 0.8. Beside the F-measure values we calculated the bug coverage ratio on every project for every machine learning algorithm. We obtained very high and promising bug coverage values (up to 100%).

show abstract

Challenging Machine Learning Algorithms in Predicting Vulnerable JavaScript Functions

Ferenć

Hegedűs

Gyimesi

et al. 2019

View full text Add to dashboard Cite

An automatically created novel bug dataset and its validation in bug prediction

Ferenć

Gyimesi

et al. 2020

Journal of Systems and Software

View full text Add to dashboard Cite

Characterization of Source Code Defects by Data Mining Conducted on GitHub

Gyimesi

Tóth

et al. 2015

View full text Add to dashboard Cite

Abstract. In software systems the coding errors are unavoidable due to the frequent source changes, the tight deadlines and the inaccurate specifications. Therefore, it is important to have tools that help us in finding these errors. One way of supporting bug prediction is to analyze the characteristics of the previous errors and identify the unknown ones based on these characteristics. This paper aims to characterize the known coding errors. Nowadays, the popularity of the source code hosting services like GitHub are increasing rapidly. They provide a variety of services, among which the most important ones are the version and bug tracking systems. Version control systems store all versions of the source code, and bug tracking systems provide a unified interface for reporting errors. Bug reports can be used to identify the wrong and the previously fixed source code parts, thus the bugs can be characterized by static source code metrics or by other quantitatively measured properties using the gathered data. We chose GitHub for the base of data collection and we selected 13 Java projects for analysis. As a result, a database was constructed, which characterizes the bugs of the examined projects, thus can be used, inter alia, to improve the automatic detection of software defects.

show abstract

Automatic Calculation of Process Metrics and their Bug Prediction Capabilities

Gyimesi¹

2017

Acta Cybern

View full text Add to dashboard Cite

Identifying fault-prone code parts is useful for the developers to help reduce the time required for locating bugs. It is usually done by characterizing the already known bugs with certain kinds of metrics and building a predictive model from the data. For the characterization of bugs, software product and process metrics are the most popular ones. The calculation of product metrics is supported by many free and commercial software products. However, tools that are capable of computing process metrics are quite rare. In this study, we present a method of computing software process metrics in a graph database. We describe the schema of the database created and we present a way to readily get the process metrics from it. With this technique, process metrics can be calculated at the file, class and method levels. We used GitHub as the source of the change history and we selected 5 open-source Java projects for processing. To retrieve positional information about the classes and methods, we used SourceMeter, a static source code analyzer tool. We used Neo4j as the graph database engine, and its query language - cypher - to get the process metrics. We published the tools we created as open-source projects on GitHub. To demonstrate the utility of our tools, we selected 25 release versions of the 5 Java projects and calculated the process metrics for all of the source code elements (files, classes and methods) in these versions. Using our previous published bug database, we built bug databases for the selected projects that contain the computed process metrics and the corresponding bug numbers for files and classes. (We published these databases as an online appendix.) Then we applied 13 machine learning algorithms on the database we created to find out if it is feasible for bug prediction purposes. We achieved F-measure values on average of around 0.7 at the class level, and slightly better values of between 0.7 and 0.75 at the file level. The best performing algorithm was the RandomForest method for both cases.

show abstract

BUGSJS: a benchmark and taxonomy of JavaScript bugs

Gyimesi

Vancsics

Stocco

et al. 2020

Software Testing Verif & Rel

View full text Add to dashboard Cite

JavaScript is a popular programming language that is also error-prone due to its asynchronous, dynamic, and loosely typed nature. In recent years, numerous techniques have been proposed for analyzing and testing JavaScript applications. However, our survey of the literature in this area revealed that the proposed techniques are often evaluated on different datasets of programs and bugs. The lack of a commonly used benchmark limits the ability to perform fair and unbiased comparisons for assessing the efficacy of new techniques. To fill this gap, we propose BUGSJS, a benchmark of 453 real, manually validated JavaScript bugs from 10 popular JavaScript server-side programs, comprising 444k lines of code (LOC) in total. Each bug is accompanied by its bug report, the test cases that expose it, as well as the patch that fixes it. We extended BUGSJS with a rich web interface for visualizing and dissecting the bugs' information, as well as a programmable API to access the faulty and fixed versions of the programs and to execute the corresponding test cases, which facilitates conducting highly reproducible empirical studies and comparisons of JavaScript analysis and testing tools. Moreover, following a rigorous procedure, we performed a classification of the bugs according to their nature. Our internal validation shows that our taxonomy is adequate for characterizing the bugs in BUGSJS. We discuss several ways in which the resulting taxonomy and the benchmark can help direct researchers interested in automated testing of JavaScript applications.

show abstract

Poster: Supporting JavaScript Experimentation with BugsJS

Vancsics

Gyimesi

Stocco

et al. 2019

View full text Add to dashboard Cite

In our recent work, we proposed BUGSJS, a benchmark of several hundred bugs from popular JavaScript serverside programs. In this abstract paper, we report the results of our initial evaluation in adopting BUGSJS to support an experiment in fault localization. First, we describe how BUGSJS facilitated accessing the information required to perform the experiment, namely, test case code, their outcomes, their associated code coverage and related bug information. Second, we illustrate how BUGSJS can be improved to further enable easier application to fault localization research, for instance, by filtering out failing test cases that do not directly contribute to a bug.We hope that our preliminary results will foster researchers in using BUGSJS to enable highly-reproducible empirical studies and comparisons of JavaScript analysis and testing tools.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

334 Leonard St

Brooklyn, NY 11211

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Péter Gyimesi

BugsJS: a Benchmark of JavaScript Bugs

A Public Bug Database of GitHub Projects and Its Application in Bug Prediction

Challenging Machine Learning Algorithms in Predicting Vulnerable JavaScript Functions

An automatically created novel bug dataset and its validation in bug prediction

Characterization of Source Code Defects by Data Mining Conducted on GitHub

Automatic Calculation of Process Metrics and their Bug Prediction Capabilities

BUGSJS: a benchmark and taxonomy of JavaScript bugs

Poster: Supporting JavaScript Experimentation with BugsJS

Contact Info

Product

Resources

About