Proteomics, the study of the protein complement of a biological system, is generating increasing quantities of data from rapidly developing technologies employed in a variety of different experimental workflows. Experimental processes, e.g. for comparative 2D gel studies or LC-MS/MS analyses of complex protein mixtures, involve a number of steps: from experimental design, through wet and dry lab operations, to publication of data in repositories and finally to data annotation and maintenance. The presence of inaccuracies throughout the processing pipeline, however, results in data that can be untrustworthy, thus offsetting the benefits of high-throughput technology. While researchers and practitioners are generally aware of some of the information quality issues associated with public proteomics data, there are few accepted criteria and guidelines for dealing with them. In this article, we highlight factors that impact on the quality of experimental data and review current approaches to information quality management in proteomics. Data quality issues are considered throughout the lifecycle of a proteomics experiment, from experiment design and technique selection, through data analysis, to archiving and sharing.
The effects of tunnel blast excavation on the lining structures of adjacent tunnels are comprehensively studied for the Xinling highway tunnel project. First, the LS-DYNA software is applied to obtain the characteristics of vibration velocities and dynamic stresses at different positions of the tunnel liner. The results indicate that the maximum peak particle velocity (PPV) is located on the haunch of the lining facing the blasting source and that the PPV and peak tensile stress decrease with the increase in the surrounding rock grade. Second, a site test on blasting vibration is conducted to verify the simulation results. By using regression analysis of the measured vibration data, the calculation method of maximum charge per delay for optimizing blasting excavation under different surrounding rock grades is obtained. Finally, based on the statistical relationship between crack alteration and PPV on the lining before and after blasting, the safety thresholds of PPV for different portions of the tunnel are determined. The recommended safety threshold of PPV is 10 cm/s for intact lining and for B-grade and V-grade linings of the surrounding rock tunnel. However, if the lining crack grade falls between 1A and B, then the recommended safety thresholds of PPV for the III-grade and IV-grade surrounding rock tunnel are 5 cm/s and 6 cm/s, respectively. The threshold PPV proposed in this study has been successfully applied to restrict blast-induced damage during new tunnel excavation of the Xinling tunnel project.
Data-intensive e-science applications often rely on third-party data found in public repositories, whose quality is largely unknown. Although scientists are aware that this uncertainty may lead to incorrect scientific conclusions, in the absence of a quantitative characterization of data quality properties they find it difficult to formulate precise data acceptability criteria. We present an Information Quality management workbench, called Qurator, that supports data experts in the specification of personal quality models, and lets them derive effective criteria for data acceptability. The demo of our working prototype will illustrate our approach on a real e-science workflow for a bioinformatics application.
Southern rice black-streaked dwarf virus (SRBSDV), which causes severe disease symptoms in rice (Oriza sativa L.) has been emerging in the last decade throughout northern Vietnam, southern Japan and southern, central and eastern China. Here we attempt to quantify the prevalence of SRBSDV in the Honghe Hani rice terraces system (HHRTS)—a Chinese 1300-year-old traditional rice production system. We first confirm that genetically diverse rice varieties are still being cultivated in the HHRTS and categorize these varieties into three main genetic clusters, including the modern hybrid varieties group (MH), the Hongyang improved modern variety group (HY) and the traditional indica landraces group (TIL). We also show over a 2-year period that SRBSDV remains prevalent in the HHRTS (20.1% prevalence) and that both the TIL (17.9% prevalence) and the MH varieties (5.1% prevalence) were less affected by SRBSDV than were the HY varieties (30.2% prevalence). Collectively we suggest that SRBSDV isolates are freely moving within the HHRTS and that TIL, HY and MH rice genetic clusters are not being preferentially infected by particular SRBSDV lineages. Given that SRBSDV can cause 30–50% rice yield losses, our study emphasizes both the need to better monitor the disease in the HHRTS, and the need to start considering ways to reduce its burden on rice production.
SUMMARYIn this paper we outline a framework for managing information quality (IQ) in an e-Science context. In contrast to previous approaches that take a very abstract view of IQ properties, we allow scientists to define the quality characteristics that are of importance to them in their particular domain. For example, 'accuracy' may be defined in terms of the conformance of experimental data to a particular standard. User-scientists specify their IQ preferences against a formal ontology, so that the definitions are machinemanipulable, allowing the environment to classify and organize domain-specific quality characteristics within an overall quality management framework. As an illustration of our approach, we present an example Web service that computes IQ annotations for experiment datasets in transcriptomics. Copyright
Abstract. We outline a framework for managing information quality (IQ) in eScience, using ontologies, semantic annotation of resources, and data bindings. Scientists define the quality characteristics that are of importance in their particular domain by extending an OWL DL IQ ontology, which classifies and organises these domain-specific quality characteristics within an overall quality management framework. RDF is used to annotate data resources, with reference to IQ indicators defined in the ontology. Data bindings -again defined in RDF -are used to represent mappings between data elements (e.g. defined in XML Schemas) and the IQ ontology. As a practical illustration of our approach, we present a case study from the domain of proteomics.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.