Leonardo Murta scite author profile

We propose noWorkflow, a tool that transparently captures provenance of scripts and enables reproducibility. Unlike existing approaches, noWorkflow is non-intrusive and does not require users to change the way they work -users need not wrap their experiments in scientific workflow systems, install version control systems, or instrument their scripts. The tool leverages Software Engineering techniques, such as abstract syntax tree analysis, reflection, and profiling, to collect different types of provenance, including detailed information about the underlying libraries. We describe how noWorkflow captures multiple kinds of provenance and the different classes of analyses it supports: graph-based visualization; differencing over provenance trails; and inference queries.

show abstract

Two-dimensional sample entropy: assessing image texture through irregularity

Silva

Filho

Fazan

et al. 2016

Biomed. Phys. Eng. Express

View full text Add to dashboard Cite

Image texture analysis is a key task in computer vision. Although various methods have been applied to extract texture information, none of them are based on the principles of sample entropy, which is a measurement of entropy rate. This paper proposes a two-dimensional sample entropy method, namely SampEn 2D , in order to measure irregularity in pixel patterns. We evaluated the proposed method in three different situations: a set of simulated images generated by a deterministic function corrupted with different levels of a stochastic influence; the Brodatz public texture database; and a real biological image set of rat sural nerve. Evaluation with simulations showed SampEn 2D as a robust irregularity measure, closely following sample entropy properties. Results with Brodatz dataset testified superiority of SampEn 2D to separate different image categories compared to conventional Haralick and wavelet descriptors. SampEn 2D was also capable of discriminating rat sural nerve images by age groups with high accuracy (AUROC = 0.844). No significant difference was found between SampEn 2D AUROC and those obtained with the best performed Haralick descriptors, i.e. entropy (AUROC = 0.828), uniformity (AUROC = 0.833), homogeneity (AUROC = 0.938) and Wavelet descriptors, i.e. Haar energy/entropy (AUROC = 0.932) and Daubechies energy/entropy (AUROC = 0.859). In addition, it was shown that SampEn 2D computation time increases with image size, being around 1400 s for a 600 × 600 pixels image. In conclusion, SampEn 2D showed to be stable and robust enough to be applied as texture feature quantifier and irregularity properties, as measured by SampEn 2D , seem to be an important feature for image characterization in biomedical image analysis.

show abstract

Investigating the Use of a Hybrid Search Strategy for Systematic Reviews

Mourão

Kalinowski

Murta

et al. 2017

View full text Add to dashboard Cite

Towards supporting the life cycle of large scale scientific experiments

Mattoso

Werner

Travassos

et al. 2010

IJBPIM

View full text Add to dashboard Cite

On the Nature of Merge Conflicts: A Study of 2,731 Open Source Java Projects Hosted by GitHub

Ghiotto

Murta

Barros

et al. 2020

IIEEE Trans. Software Eng.

View full text Add to dashboard Cite

When multiple developers change a software system in parallel, these concurrent changes need to be merged to all appear in the software being developed. Numerous merge techniques have been proposed to support this task, but none of them can fully automate the merge process. Indeed, it has been reported that as much as 10% to 20% of all merge attempts result in a merge conflict, meaning that a developer has to manually complete the merge. To date, we have little insight into the nature of these merge conflicts. What do they look like, in detail? How do developers resolve them? Do any patterns exist that might suggest new merge techniques that could reduce the manual effort? This paper contributes an in-depth study of the merge conflicts found in the histories of 2,731 open source Java projects. Seeded by the manual analysis of the histories of five projects, our automated analysis of all 2,731 projects: (1) characterizes the merge conflicts in terms of number of chunks, size, and programming language constructs involved, (2) classifies the manual resolution strategies that developers use to address these merge conflicts, and (3) analyzes the relationships between various characteristics of the merge conflicts and the chosen resolution strategies. Our results give rise to three primary recommendations for future merge techniques, that -when implemented -could on one hand help in automatically resolving certain types of conflicts and on the other hand provide the developer with tool-based assistance to more easily resolve other types of conflicts that cannot be automatically resolved.

show abstract

On the performance of hybrid search strategies for systematic literature reviews in software engineering

Mourão

Pimentel

Murta

et al. 2020

Information and Software Technology

View full text Add to dashboard Cite

A Survey on Collecting, Managing, and Analyzing Provenance from Scripts

et al. 2019

View full text Add to dashboard Cite

Scripts are widely used to design and run scientific experiments. Scripting languages are easy to learn and use, and they allow complex tasks to be specified and executed in fewer steps than with traditional programming languages. However, they also have important limitations for reproducibility and data management. As experiments are iteratively refined, it is challenging to reason about each experiment run (or trial), to keep track of the association between trials and experiment instances as well as the differences across trials, and to connect results to specific input data and parameters. Approaches have been proposed that address these limitations by collecting, managing, and analyzing the provenance of scripts. In this article, we survey the state of the art in provenance for scripts. We have identified the approaches by following an exhaustive protocol of forward and backward literature snowballing. Based on a detailed study, we propose a taxonomy and classify the approaches using this taxonomy.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Leonardo Murta

A Large-Scale Study About Quality and Reproducibility of Jupyter Notebooks

noWorkflow: Capturing and Analyzing Provenance of Scripts

Two-dimensional sample entropy: assessing image texture through irregularity

Investigating the Use of a Hybrid Search Strategy for Systematic Reviews

Towards supporting the life cycle of large scale scientific experiments

On the Nature of Merge Conflicts: A Study of 2,731 Open Source Java Projects Hosted by GitHub

On the performance of hybrid search strategies for systematic literature reviews in software engineering

A Survey on Collecting, Managing, and Analyzing Provenance from Scripts

Contact Info

Product

Resources

About