Synthetic DNA is durable and can encode digital data with high density, making it an attractive medium for data storage. However, recovering stored data on a large-scale currently requires all the DNA in a pool to be sequenced, even if only a subset of the information needs to be extracted. Here, we encode and store 35 distinct files (over 200 MB of data), in more than 13 million DNA oligonucleotides, and show that we can recover each file individually and with no errors, using a random access approach. We design and validate a large library of primers that enable individual recovery of all files stored within the DNA. We also develop an algorithm that greatly reduces the sequencing read coverage required for error-free decoding by maximizing information from all sequence reads. These advances demonstrate a viable, large-scale system for DNA data storage and retrieval.
Motivated by applications to distributed storage, Gopalan et al recently introduced the interesting notion of information-symbol locality in a linear code. By this it is meant that each message symbol appears in a parity-check equation associated with small Hamming weight, thereby enabling recovery of the message symbol by examining a small number of other code symbols. This notion is expanded to the case when all code symbols, not just the message symbols, are covered by such "local" parity. In this paper, we extend the results of Gopalan et. al. so as to permit recovery of an erased code symbol even in the presence of errors in local parity symbols. We present tight bounds on the minimum distance of such codes and exhibit codes that are optimal with respect to the local error-correction property. As a corollary, we obtain an upper bound on the minimum distance of a concatenated code. ] linear code C over the field F q is said to have locality r if this symbol can be recovered by accessing at most r other code symbols of code C. Equivalently, for any coordinate i, there exists a row in the parity-check matrix of the code of Hamming weight at most r + 1, whose support includes i. An (r, d) code was defined as a systematic linear code C having minimum distance d, where all k message symbols have locality r. It was shown that the minimum distance of an (r, d) code is upper bounded by
Regenerating codes and codes with locality are two schemes that have recently been proposed to ensure data collection and reliability in a distributed storage network. In a situation where one is attempting to repair a failed node, regenerating codes seek to minimize the amount of data downloaded for node repair, while codes with locality attempt to minimize the number of helper nodes accessed. In this paper, we provide several constructions for a class of vector codes with locality in which the local codes are regenerating codes, that enjoy both advantages. We derive an upper bound on the minimum distance of this class of codes and show that the proposed constructions achieve this bound. The constructions include both the cases where the local regenerating codes correspond to the MSR as well as the MBR point on the storage-repair-bandwidth tradeoff curve of regenerating codes.
Current approaches to single-cell transcriptomic analysis are computationally intensive and require assay-specific modeling, which limits their scope and generality. We propose a novel method that compares and clusters cells based on their transcript-compatibility read counts rather than on the transcript or gene quantifications used in standard analysis pipelines. In the reanalysis of two landmark yet disparate single-cell RNA-seq datasets, we show that our method is up to two orders of magnitude faster than previous approaches, provides accurate and in some cases improved results, and is directly applicable to data from a wide variety of assays.Electronic supplementary materialThe online version of this article (doi:10.1186/s13059-016-0970-8) contains supplementary material, which is available to authorized users.
Long-read sequencing technologies have the potential to produce gold-standard de novo genome assemblies, but fully exploiting error-prone reads to resolve repeats remains a challenge. Aggressive approaches to repeat resolution often produce misassemblies, and conservative approaches lead to unnecessary fragmentation. We present HINGE, an assembler that seeks to achieve optimal repeat resolution by distinguishing repeats that can be resolved given the data from those that cannot. This is accomplished by adding "hinges" to reads for constructing an overlap graph where only unresolvable repeats are merged. As a result, HINGE combines the error resilience of overlap-based assemblers with repeat-resolution capabilities of de Bruijn graph assemblers. HINGE was evaluated on the long-read bacterial data sets from the NCTC project. HINGE produces more finished assemblies than Miniasm and the manual pipeline of NCTC based on the HGAP assembler and Circlator. HINGE also allows us to identify 40 data sets where unresolvable repeats prevent the reliable construction of a unique finished assembly. In these cases, HINGE outputs a visually interpretable assembly graph that encodes all possible finished assemblies consistent with the reads, while other approaches such as the NCTC pipeline and FALCON either fragment the assembly or resolve the ambiguity arbitrarily.
Node failures are inevitable in distributed storage systems (DSS). To enable efficient repair when faced with such failures, two main techniques are known: Regenerating codes, i.e., codes that minimize the total repair bandwidth; and codes with locality, which minimize the number of nodes participating in the repair process. This paper focuses on regenerating codes with locality, using pre-coding based on Gabidulin codes, and presents constructions that utilize minimum bandwidth regenerating (MBR) local codes. The constructions achieve maximum resilience (i.e., optimal minimum distance) and have maximum capacity (i.e., maximum rate). Finally, the same pre-coding mechanism can be combined with a subclass of fractional-repetition codes to enable maximum resilience and repair-by-transfer simultaneously. I. BACKGROUND A. Vector CodesAn [n, K, d min , α] vector code over a field F q is a code C of block length n, having a symbol alphabet F α q for some α > 1, satisfying the additional property that given c, c ∈ C and a, b ∈ F q , ac + bc also belongs to C. As a vector space over F q , C has dimension K, termed the scalar dimension (equivalently, the file size) of the code and as a code over the alphabet F α q , the code has minimum distance d min . Associated with the vector code C is an F q -linear scalar code C (s) of length N = nα, where C (s) is obtained by expanding each vector symbol within a codeword into α scalar symbols (in some prescribed order). Given a generator matrix G for the scalar code C (s) , the first code symbol in the vector code is naturally associated with the first α columns of G etc. We will refer to the collection of α columns of G associated with the i th code symbol c i as the i th thick column and to avoid confusion, the columns of G themselves as thin columns. B. Locality in Vector CodesLet C be an [n, K, d min , α] vector code over a field F q , possessing a (K × nα) generator matrix G. The i th code symbol, c i , is said to have (r, δ) locality, δ ≥ 2, if there exists a punctured code C i := C| Si of C (called a local code) with support S i ⊆ {1, 2, · · · , n} such thatThe code C is said to have (r, δ) information locality if there exists l code symbols with (r, δ) locality and respective support setsThe code C is said to have (r, δ) all-symbol locality if all code symbols have (r, δ) locality. A code with (r, δ) information (respectively, all-symbol) locality is said to have full (r, δ) information (respectively, all-symbol) locality, if all local codes have parameters given by |S i | = r + δ − 1 and d min (C i ) = δ, for i = 1, · · · , l.The concept of locality for scalar codes, with δ = 2, was introduced in [1] and extended in [2] and [3] to scalar codes with arbitrary δ, and vector codes with δ = 2, respectively. This was further extended to vector codes with arbitrary δ in [4] and [5], where, in addition to constructions of vector codes with locality, authors derive minimum distance upper bounds and also consider settings in which the local codes have regeneration properti...
To systematically define molecular features in human tumour cells which determine their degree of sensitivity to human allogeneic natural killer (NK) cells, we quantified the NK cell responsiveness of hundreds of molecularly-annotated "DNA-barcoded" solid tumour cell lines in multiplexed format and applied genome-scale CRISPR-based gene editing screens in several solid tumour cell lines to functionally interrogate which genes in tumour cells regulate the response to NK cells. In these orthogonal studies, NK-sensitive tumour cells tend to exhibit "mesenchymal-like" transcriptional programs; high transcriptional signature for chromatin remodeling complexes; high levels of B7-H6 (NCR3LG1); low levels of HLA-E/antigen presentation genes. Importantly, transcriptional signatures of NK cell-sensitive tumour cells correlate with immune checkpoint inhibitor (ICI) resistance in clinical samples. This study provides a comprehensive map of mechanisms regulating tumour cell responses to NK cells, with implications for future biomarker-driven applications of NK cell immunotherapies.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.