Science has become a highly competitive undertaking concerning, for example, resources, positions, students, and publications. At the same time, the number of journals presenting scientific findings skyrockets while the knowledge increase per manuscript seems to be diminishing. Science has also become ever more dependent on computational analyses. For example, virtually all biomedical applications involve computational data analysis. The science community develops many computational tools, and there are numerous alternatives for many computational tasks. The same is true for workflow management systems, leading to a tremendous duplication of efforts. Software quality is often of low concern, and typically, a small dataset is used as a proof of principle to support rapid publication. Installation and usage of such tools are complicated, so virtual machine images, containers, and package managers are employed more frequently. These simplify installation and ease of use but do not solve the software quality issue and duplication of effort. We believe that a community-wide collaboration is needed to (a) ensure software quality, (b) increase reuse of code, (c) force proper software review, (c) increase testing, and (d) make interoperability more seamless. Such a science software ecosystem will overcome current issues and increase trust in current data analyses.
For the identification and sequencing of proteins, mass spectrometry (MS) has become the tool of choice and, as such, drives proteomics. MS/MS spectra need to be assigned a peptide sequence for which two strategies exist. Either database search or de novo sequencing can be employed to establish peptide spectrum matches. For database search, mzIdentML is the current community standard for data representation. There is no community standard for representing de novo sequencing results, but we previously proposed the de novo markup language (DNML). At the moment, each de novo sequencing solution uses different data representation, complicating downstream data integration, which is crucial since ensemble predictions may be more useful than predictions of a single tool. We here propose the de novo MS Ontology (DNMSO), which can, for example, provide many-to-many mappings between spectra and peptide predictions. Additionally, an application programming interface (API) that supports any file operation necessary for de novo sequencing from spectra input to reading, writing, creating, of the DNMSO format, as well as conversion from many other file formats, has been implemented. This API removes all overhead from the production of de novo sequencing tools and allows developers to concentrate on algorithm development completely. We make the API and formal descriptions of the format freely available at https://github.com/savastakan/dnmso.
Mutation analysis is a widely used technique to evaluate the effectiveness of test cases in both hardware and software testing. The original model is mutated systematically under certain fault assumptions and test cases are checked against the mutants created to see whether the test cases can detect the faults or not. Mutation analysis is usually a computationally intensive task, particularly in finite state machine (FSM) testing due to a possibly huge amount of mutants. Random selection could be a practical reduction method under the assumption that each mutant is identical in terms of the probability of occurrence of its associating fault. The present study proposes a mutant selection method based on Fourier analysis of Boolean functions. Fourier helps to identify the most effective transitions on the output so that the mutants related to those transitions can be selected. Such mutants are considered more important since they are more likely to be killed. To evaluate the method, test cases are generated by the well-known W method, which has the capability of detecting every potential fault. The original and reduced sets of mutants are compared with respect to their importance values. Evaluations show that the mutants selected by the proposed technique are more effective, which reduces the cost of mutation analysis without sacrificing the performance of the mutation analysis.
AI fairness is an essential topic as regards its topical and social-societal implications. However, there are many challenges posed by automating AI fairness. Based on the challenges around automating fairness in texts, our study aims to create a new fairness testing paradigm that can gather disparate proposals on fairness on a single platform, test them, and develop the most effective method, thereby contributing to the general orientation on fairness. To ensure and sustain mass participation in solving the fairness problem, gamification elements are used to mobilize individuals’ motivation. In this framework, gamification in the design allows participants to see their progress and compare it with other players. It uses extrinsic motivation elements, i.e., rewarding participants by publicizing their achievements to the masses. The validity of the design is demonstrated through the example scenario. Our design represents a platform for the development of practices on fairness and can be instrumental in making contributions to this issue sustainable. We plan to further realize a plot application of this structure designed with the gamification method in future studies.
Proteomics is the study of the proteins that can be derived from a genome. For the identification and sequencing of proteins, mass spectrometry has become the tool of choice. Within mass spectrometry-based proteomics, proteins can be identified or sequenced by either database search or de novo sequencing. Both methods have certain advantages and drawbacks but in the long run we envision de novo sequencing to become the predominant tool. Currently, de novo sequencing results are stored in arbitrary file formats, depending on the developers of the algorithms. We identified this as a large and unnecessary obstacle while integrating results from multiple de novo sequencing algorithms. Therefore, we designed a standard file format for the representation of de novo sequencing results. We further developed an application programming interface since we identified the lack of proper APIs as another obstacle, introducing a needlessly high learning curve for developers.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.