This article describes XRel, a novel approach for storage and retrieval of XML documents using relational databases. In this approach, an XML document is decomposed into nodes on the basis of its tree structure and stored in relational tables according to the node type, with path information from the root to each node. XRel enables us to store XML documents using a fixed relational schema without any information about DTDs and also to utilize indices such as the B + -tree and the R -tree supported by database management systems. Thus, XRel does not need any extension of relational databases for storing XML documents. For processing XML queries, we present an algorithm for translating a core subset of XPath expressions into SQL queries. Finally, we demonstrate the effectiveness of this approach through several experiments using actual XML documents.
Predicting the contribution of media components to bacterial growth was first initiated by introducing machine learning to high-throughput growth assays. A total of 1336 temporal growth records corresponding to 225 different media, which were composed of 13 chemical components, were generated. The growth rate and saturated density of each growth curve were automatically calculated with the newly developed data processing program. To identify the decision making factors related to growth among the 13 chemicals, big datasets linking the growth parameters to the chemical combinations were subjected to decision tree learning. The results showed that the only carbon source, glucose, determined bacterial growth, but it was not the first priority. Instead, the top decision making chemicals in relation to the growth rate and saturated density were ammonium and ferric ions, respectively. Three chemical components (NH 4 + , Mg 2+ and glucose) commonly appeared in the decision trees of the growth rate and saturated density, but they exhibited different mechanisms. The concentration ranges for fast growth and high density were overlapped for glucose but distinguished for NH 4 + and Mg 2+ . The results suggested that these chemicals were crucial in determining the growth speed and growth maximum in either a universal use or a trade-off manner. This differentiation might reflect the diversity in the resource allocation mechanisms for growth priority depending on the environmental restrictions. This study provides a representative example for clarifying the contribution of the environment to population dynamics through an innovative viewpoint of employing modern data science within traditional microbiology to obtain novel findings.
We present a new catalog of 9318 Lyα emitter (LAE) candidates at z = 2.2, 3.3, 4.9, 5.7, 6.6, and 7.0 that are photometrically selected by the SILVERRUSH program with a machine learning technique from large area (up to 25.0 deg2) imaging data with six narrowband filters taken by the Subaru Strategic Program with Hyper Suprime-Cam and a Subaru intensive program, Cosmic HydrOgen Reionization Unveiled with Subaru. We construct a convolutional neural network that distinguishes between real LAEs and contaminants with a completeness of 94% and a contamination rate of 1%, enabling us to efficiently remove contaminants from the photometrically selected LAE candidates. We confirm that our LAE catalogs include 177 LAEs that have been spectroscopically identified in our SILVERRUSH programs and previous studies, ensuring the validity of our machine learning selection. In addition, we find that the object-matching rates between our LAE catalogs and our previous results are ≃80%–100% at bright NB magnitudes of ≲24 mag. We also confirm that the surface number densities of our LAE candidates are consistent with previous results. Our LAE catalogs will be made public on our project webpage.
To estimate the functions of mitochondria of diverse eukaryotic nonmodel organisms in which the mitochondrial proteomes are not available, it is necessary to predict the protein sequence features of the mitochondrial proteins computationally. Various prediction methods that are trained using the proteins of model organisms belonging particularly to animals, plants, and fungi exist. However, such methods may not be suitable for predicting the proteins derived from nonmodel organisms because the sequence features of the mitochondrial proteins of diversified nonmodel organisms can differ from those of model organisms that are present only in restricted parts of the tree of eukaryotes. Here, we proposed NommPred, which predicts the mitochondrial proteins of nonmodel organisms that are widely distributed over eukaryotes. We used a gradient boosting machine to develop 2 predictors—one for predicting the proteins of mitochondria and the other for predicting the proteins of mitochondrion-related organelles that are highly reduced mitochondria. The performance of both predictors was found to be better than that of the best method available.
Recently, with the rapid spread of XML format, it has become popular that large-scale data, whose size range from several hundreds of MB to several GB, are described by XML. For the purpose of providing fast and reliable means for storage and retrieval of huge XML data, it is a reasonable choice for us to use XML databases. In fact, there are many ways to realize XML databases, but relational XML database, in that an XML data is mapped to relational tables and query processing is enabled in terms of SQL queries, is one of the most popular way to implement XML databases. However, some researchers have pointed out that the performance of relational XML databases degrades when dealing with such huge XML data. In this study, we propose a scheme for parallel processing of XML data using PC Clusters. First, we discuss how to decompose XML data so that we can perform parallel processing of XML queries. We give the definitions of vertical and horizontal decomposition of XML data based on decomposition of schema graph and XML instances, respectively. To allocate decomposed XML data to cluster nodes, we give an algorithm for computing pseudo-optimal assignment of XML fragments like greedy method in the light of XML query workload. Finally, we experimentally evaluate the effectiveness of the proposed method.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.