Private information retrieval (PIR) allows a user to retrieve a desired message from a set of databases without revealing the identity of the desired message. The replicated databases scenario was considered by Sun and Jafar in [1], where N databases can store the same K messages completely. A PIR scheme was developed to achieve the optimal download cost given by 1In this work, we consider the problem of PIR from storage constrained databases. Each database has a storage capacity of µKL bits, where L is the size of each message in bits, and µ ∈ [1/N, 1] is the normalized storage. On one extreme, µ = 1 is the replicated databases case considered in [1]. On the other hand, when µ = 1/N , then in order to retrieve a message privately, the user has to download all the messages from the databases achieving a download cost of 1/K. We aim to characterize the optimal download cost versus storage trade-off for any storage capacity in the range µ ∈ [1/N, 1].In the storage constrained PIR problem, there are two key challenges: a) construction of communication efficient schemes through storage content design at each database that allow download efficient PIR; and b) characterizing the optimal download cost via information-theoretic lower bounds. The novel aspect of this work is to characterize the optimum download cost of PIR from uncoded storage constrained databases for any value of storage. In particular, for any (N, K), we show that the optimal trade-off between storage, µ, and the download cost, D(µ), is given by the lower convex hull of the N pairs t N
BackgroundXyloglucan (XyG) is a ubiquitous and fundamental polysaccharide of plant cell walls. Due to its structural complexity, XyG requires a combination of backbone-cleaving and sidechain-debranching enzymes for complete deconstruction into its component monosaccharides. The soil saprophyte Cellvibrio japonicus has emerged as a genetically tractable model system to study biomass saccharification, in part due to its innate capacity to utilize a wide range of plant polysaccharides for growth. Whereas the downstream debranching enzymes of the xyloglucan utilization system of C. japonicus have been functionally characterized, the requisite backbone-cleaving endo-xyloglucanases were unresolved.ResultsCombined bioinformatic and transcriptomic analyses implicated three glycoside hydrolase family 5 subfamily 4 (GH5_4) members, with distinct modular organization, as potential keystone endo-xyloglucanases in C. japonicus. Detailed biochemical and enzymatic characterization of the GH5_4 modules of all three recombinant proteins confirmed particularly high specificities for the XyG polysaccharide versus a panel of other cell wall glycans, including mixed-linkage beta-glucan and cellulose. Moreover, product analysis demonstrated that all three enzymes generated XyG oligosaccharides required for subsequent saccharification by known exo-glycosidases. Crystallographic analysis of GH5D, which was the only GH5_4 member specifically and highly upregulated during growth on XyG, in free, product-complex, and active-site affinity-labelled forms revealed the molecular basis for the exquisite XyG specificity among these GH5_4 enzymes. Strikingly, exhaustive reverse-genetic analysis of all three GH5_4 members and a previously biochemically characterized GH74 member failed to reveal a growth defect, thereby indicating functional compensation in vivo, both among members of this cohort and by other, yet unidentified, xyloglucanases in C. japonicus. Our systems-based analysis indicates distinct substrate-sensing (GH74, GH5E, GH5F) and attack-mounting (GH5D) functions for the endo-xyloglucanases characterized here.ConclusionsThrough a multi-faceted, molecular systems-based approach, this study provides a new insight into the saccharification pathway of xyloglucan utilization system of C. japonicus. The detailed structural–functional characterization of three distinct GH5_4 endo-xyloglucanases will inform future bioinformatic predictions across species, and provides new CAZymes with defined specificity that may be harnessed in industrial and other biotechnological applications.Electronic supplementary materialThe online version of this article (10.1186/s13068-018-1039-6) contains supplementary material, which is available to authorized users.
The heteropolysaccharide xyloglucan (XyG) comprises up to one‐quarter of the total carbohydrate content of terrestrial plant cell walls and, as such, represents a significant reservoir in the global carbon cycle. The complex composition of XyG requires a consortium of backbone‐cleaving endo‐xyloglucanases and side‐chain cleaving exo‐glycosidases for complete saccharification. The biochemical basis for XyG utilization by the model Gram‐negative soil saprophytic bacterium Cellvibrio japonicus is incompletely understood, despite the recent characterization of associated side‐chain cleaving exo‐glycosidases. We present a detailed functional and structural characterization of a multimodular enzyme encoded by gene locus CJA_2477. The CJA_2477 gene product comprises an N‐terminal glycoside hydrolase family 74 (GH74) endo‐xyloglucanase module in train with two carbohydrate‐binding modules (CBMs) from families 10 and 2 (CBM10 and CBM2). The GH74 catalytic domain generates Glc4‐based xylogluco‐oligosaccharide (XyGO) substrates for downstream enzymes through an endo‐dissociative mode of action. X‐ray crystallography of the GH74 module, alone and in complex with XyGO products spanning the entire active site, revealed a broad substrate‐binding cleft specifically adapted to XyG recognition, which is composed of two seven‐bladed propeller domains characteristic of the GH74 family. The appended CBM10 and CBM2 members notably did not bind XyG, nor other soluble polysaccharides, and instead were specific cellulose‐binding modules. Taken together, these data shed light on the first step of xyloglucan utilization by C. japonicus and expand the repertoire of GHs and CBMs for selective biomass analysis and utilization. Database Structural data have been deposited in the RCSB protein database under the Protein Data Bank codes: http://www.rcsb.org/pdb/search/structidSearch.do?structureId=5FKR, http://www.rcsb.org/pdb/search/structidSearch.do?structureId=5FKS, http://www.rcsb.org/pdb/search/structidSearch.do?structureId=5FKT and http://www.rcsb.org/pdb/search/structidSearch.do?structureId=5FKQ.
Abstract-Data shuffling is one of the fundamental building blocks for distributed learning algorithms, that increases the statistical gain for each step of the learning process. In each iteration, different shuffled data points are assigned by a central node to a distributed set of workers to perform local computations, which leads to communication bottlenecks. The focus of this paper is on formalizing and understanding the fundamental information-theoretic tradeoff between storage (per worker) and the worst-case communication overhead for the data shuffling problem. We completely characterize the information theoretic tradeoff for K = 2, and K = 3 workers, for any value of storage capacity, and show that increasing the storage across workers can reduce the communication overhead by leveraging coding. We propose a novel and systematic data delivery and storage update strategy for each data shuffle iteration, which preserves the structural properties of the storage across the workers, and aids in minimizing the communication overhead in subsequent data shuffling iterations.
Glycoside hydrolase family 74 (GH74) is a historically important family of endo--glucanases. On the basis of early reports of detectable activity on cellulose and soluble cellulose derivatives, GH74 was originally considered to be a "cellulase" family, although more recent studies have generally indicated a high specificity toward the ubiquitous plant cell wall matrix glycan xyloglucan. Previous studies have indicated that GH74 xyloglucanases differ in backbone cleavage regiospecificities and can adopt three distinct hydrolytic modes of action: exo, endo-dissociative, and endo-processive. To improve functional predictions within GH74, here we coupled in-depth biochemical characterization of 17 recombinant proteins with structural biology-based investigations in the context of a comprehensive molecular phylogeny, including all previously characterized family members. Elucidation of four new GH74 tertiary structures, as well as one distantly related dual seven-bladed propeller protein from a marine bacterium, highlighted key structure-function relationships along protein evolutionary trajectories. We could define five phylogenetic groups, which delineated the mode of action and the regiospecificity of GH74 members. At the extremes, a major group of enzymes diverged to hydrolyze the backbone of xyloglucan nonspecifically with a dissociative mode of action and relaxed backbone regiospecificity. In contrast, a sister group of GH74 enzymes has evolved a large hydrophobic platform comprising 10 subsites, which facilitates processivity. Overall, the findings of our study refine our understanding of catalysis in GH74, providing a framework for future experimentation as well as for bioinformatics predictions of sequences emerging from (meta)genomic studies. Terrestrial plants harbor ϳ80% of the biomass on Earth, some 450 gigatons of carbon, in the form of lignocellulose (cell walls comprised of cellulose, matrix glycans, lignin, and other polymers) (1). Although terrestrial biomass represents an attractive renewable resource for the production of fuels, chemicals, and materials for human consumption, the controlled degradation of lignocellulose, whether (thermo)chemical or enzymatic, is hindered by its heterogeneous composition and complex organization (2). Hence, significant efforts have been made to identify enzymes able to efficiently modify and deconstruct this complex material. Xyloglucans (XyGs) 3 comprise a prominent family of cell wall matrix glycans (hemicelluloses). XyGs are ubiquitous in land plants, in which they constitute up to 20% of the dry weight of cell walls (3, 4). Notably, XyGs are secreted by roots of diverse plant species and are therefore likely to actively influence rhizobiota (5). XyGs are also found as storage polysaccharides comprising ϳ50% of the mass of some seeds (e.g. tamarind and nasturtium) and therefore represent important agricultural byproducts with applications in the food, biomaterial, and medical sectors (6, 7). XyGs have a -1,4-linked glucosyl backbone ("G" unit), some of which a...
Data shuffling between distributed cluster of nodes is one of the critical steps in implementing large-scale learning algorithms. Randomly shuffling the data-set among a cluster of workers allows different nodes to obtain fresh data assignments at each learning epoch. This process has been shown to provide improvements in the learning process. However, the statistical benefits of distributed data shuffling come at the cost of extra communication overhead from the master node to worker nodes, and can act as one of the major bottlenecks in the overall time for computation. There has been significant recent interest in devising approaches to minimize this communication overhead. One approach is to provision for extra storage at the computing nodes. The other emerging approach is to leverage coded communication to minimize the overall communication overhead.The focus of this work is to understand the fundamental trade-off between the amount of storage and the communication overhead for distributed data shuffling. In this work, we first present an information theoretic formulation for the data shuffling problem, accounting for the underlying problem parameters (number of workers, K, number of data points, N , and the available storage, S per node). We then present an information theoretic lower bound on the communication overhead for data shuffling as a function of these parameters. We next present a novel coded communication scheme and show that the resulting communication overhead of the proposed scheme is within a multiplicative factor of at most K K−1 from the informationtheoretic lower bound. Furthermore, we present the aligned coded shuffling scheme for some storage values, which achieves the optimal storage vs communication trade-off for K < 5, and further reduces the maximum multiplicative gap down topresence of stragglers was proposed in [20] for synchronous gradient descent, and [21-23] for linear computation tasks, e.g., matrix multiplication. The use of Polynomial codes for high dimensional coded matrix multiplication was proposed in [24]. Coded computation over wireless networks was proposed in [25], where only one worker can transmit at a time. The use of codes to reduce the communication overhead due to data shuffling was considered in [26][27][28][29][30][31][32][33][34][35]. In [26][27][28][29], the authors considered the MapReduce setting, where in order to reduce the communication between the mappers and the reducers, coding opportunities are created with more redundant computations at the mappers, leading to a trade-off between communication and computation. [30,31] provided a unified coding framework for distributed computing, where the communication load due to shuffling can be alleviated by trading the computational complexity in the presence of straggling servers. The information theoretic limits for data shuffling in the wired master-worker setting was considered in [19,32,33]. Coded data shuffling in wireless setting was recently considered in [18,34,35] for both centralized and decentralized...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.