We propose noWorkflow, a tool that transparently captures provenance of scripts and enables reproducibility. Unlike existing approaches, noWorkflow is non-intrusive and does not require users to change the way they work -users need not wrap their experiments in scientific workflow systems, install version control systems, or instrument their scripts. The tool leverages Software Engineering techniques, such as abstract syntax tree analysis, reflection, and profiling, to collect different types of provenance, including detailed information about the underlying libraries. We describe how noWorkflow captures multiple kinds of provenance and the different classes of analyses it supports: graph-based visualization; differencing over provenance trails; and inference queries.
Image texture analysis is a key task in computer vision. Although various methods have been applied to extract texture information, none of them are based on the principles of sample entropy, which is a measurement of entropy rate. This paper proposes a two-dimensional sample entropy method, namely SampEn 2D , in order to measure irregularity in pixel patterns. We evaluated the proposed method in three different situations: a set of simulated images generated by a deterministic function corrupted with different levels of a stochastic influence; the Brodatz public texture database; and a real biological image set of rat sural nerve. Evaluation with simulations showed SampEn 2D as a robust irregularity measure, closely following sample entropy properties. Results with Brodatz dataset testified superiority of SampEn 2D to separate different image categories compared to conventional Haralick and wavelet descriptors. SampEn 2D was also capable of discriminating rat sural nerve images by age groups with high accuracy (AUROC = 0.844). No significant difference was found between SampEn 2D AUROC and those obtained with the best performed Haralick descriptors, i.e. entropy (AUROC = 0.828), uniformity (AUROC = 0.833), homogeneity (AUROC = 0.938) and Wavelet descriptors, i.e. Haar energy/entropy (AUROC = 0.932) and Daubechies energy/entropy (AUROC = 0.859). In addition, it was shown that SampEn 2D computation time increases with image size, being around 1400 s for a 600 × 600 pixels image. In conclusion, SampEn 2D showed to be stable and robust enough to be applied as texture feature quantifier and irregularity properties, as measured by SampEn 2D , seem to be an important feature for image characterization in biomedical image analysis.
When multiple developers change a software system in parallel, these concurrent changes need to be merged to all appear in the software being developed. Numerous merge techniques have been proposed to support this task, but none of them can fully automate the merge process. Indeed, it has been reported that as much as 10% to 20% of all merge attempts result in a merge conflict, meaning that a developer has to manually complete the merge. To date, we have little insight into the nature of these merge conflicts. What do they look like, in detail? How do developers resolve them? Do any patterns exist that might suggest new merge techniques that could reduce the manual effort? This paper contributes an in-depth study of the merge conflicts found in the histories of 2,731 open source Java projects. Seeded by the manual analysis of the histories of five projects, our automated analysis of all 2,731 projects: (1) characterizes the merge conflicts in terms of number of chunks, size, and programming language constructs involved, (2) classifies the manual resolution strategies that developers use to address these merge conflicts, and (3) analyzes the relationships between various characteristics of the merge conflicts and the chosen resolution strategies. Our results give rise to three primary recommendations for future merge techniques, that -when implemented -could on one hand help in automatically resolving certain types of conflicts and on the other hand provide the developer with tool-based assistance to more easily resolve other types of conflicts that cannot be automatically resolved.
Scripts are widely used to design and run scientific experiments. Scripting languages are easy to learn and use, and they allow complex tasks to be specified and executed in fewer steps than with traditional programming languages. However, they also have important limitations for reproducibility and data management. As experiments are iteratively refined, it is challenging to reason about each experiment run (or trial), to keep track of the association between trials and experiment instances as well as the differences across trials, and to connect results to specific input data and parameters. Approaches have been proposed that address these limitations by collecting, managing, and analyzing the provenance of scripts. In this article, we survey the state of the art in provenance for scripts. We have identified the approaches by following an exhaustive protocol of forward and backward literature snowballing. Based on a detailed study, we propose a taxonomy and classify the approaches using this taxonomy.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.