Botond Sipos scite author profile

The shift to digital systems for the creation, transmission and storage of information has led to increasing complexity in archiving, requiring active, ongoing maintenance of the digital media. DNA is an attractive target for information storage 1 because of its capacity for high density information encoding, longevity under easily-achieved conditions 2-4 and proven track record as an information bearer. Previous DNA-based information storage approaches have encoded only trivial amounts of information 5-7 or were not amenable to scaling-up 8 , and used no robust errorcorrection and lacked examination of their cost-efficiency for large-scale information archival 9 . Here we describe a scalable method that can reliably store more information than has been handled before. We encoded computer files totalling 739 kB of hard disk storage and with an estimated Shannon information 10 of 5.2 × 10 6 bits into a DNA code, synthesised this DNA, sequenced it and reconstructed the original files with 100% accuracy. Theoretical analysis indicates that our DNA-storage scheme scales far beyond current global information volumes. These results demonstrate DNA-storage to be a realistic technology for large-scale digital archiving that may already be cost-effective for low access, multi-century-long archiving tasks. Within a decade, as costs fall rapidly under realistic scenarios for technological advances, it may be cost-effective for sub-50-year archival.Although techniques for manipulating, storing and copying large amounts of DNA have been established for many years [11][12][13] , these rely on the availability of initial copies of the DNA molecule to be processed, and one of the main challenges for practical information storage in DNA is the difficulty of synthesising long sequences of DNA de novo to an exactly-specified design. Instead, we developed an in vitro approach that represents the information being stored as a hypothetical long DNA molecule and encodes this using shorter DNA fragments. A similar approach was proposed by Church et al. 9 in a report * To whom correspondence should be addressed; goldman@ebi.ac.uk. SupplementaryInformation is provided as a number of separate files accompanying this document.Author Contributions N.G. and E.B. conceived and planned the project and devised the information encoding methods. P.B. advised on NGS protocols, prepared the DNA library and managed the sequencing process. S.C. and E.M.L. provided custom oligonucleotides. N.G. wrote the software for encoding and decoding information into/from DNA and analysed the data. N.G., E.B., C.D. and B.S. modelled the scaling properties of DNA-storage. N.G. wrote the paper with discussions and contributions from all other authors. N.G. and C.D. produced the figures.Author Information Data are available online at http://www.ebi.ac.uk/goldman-srv/DNA-storage and in the Sequence Read Archive (SRA) with accession number ERP002040 (to be confirmed). Correspondence and requests for materials should be addressed to N.G. (goldman@ebi.ac.uk). Co...

show abstract

Highly parallel direct RNA sequencing on an array of nanopores

Jachimowicz

et al. 2018

Nat Methods

766

549

View full text Add to dashboard Cite

Sequencing the RNA in a biological sample can unlock a wealth of information, including the identity of bacteria and viruses, the nuances of alternative splicing or the transcriptional state of organisms. However, current methods have limitations due to short read lengths and reverse transcription or amplification biases. Here we demonstrate nanopore direct RNA-seq, a highly parallel, real-time, single-molecule method that circumvents reverse transcription or amplification steps. This method yields full-length, strand-specific RNA sequences and enables the direct detection of nucleotide analogs in RNA.

show abstract

Systematic evaluation of spliced alignment programs for RNA-seq data

et al. 2013

View full text Add to dashboard Cite

High-throughput RNA sequencing is an increasingly accessible method for studying gene structure and activity on a genome-wide scale. A critical step in RNA-seq data analysis is the alignment of partial transcript reads to a reference genome sequence. to assess the performance of current mapping software, we invited developers of RNA-seq aligners to process four large human and mouse RNA-seq data sets. in total, we compared 26 mapping protocols based on 11 programs and pipelines and found major performance differences between methods on numerous benchmarks, including alignment yield, basewise accuracy, mismatch and gap placement, exon junction discovery and suitability of alignments for transcript reconstruction. We observed concordant results on real and simulated RNA-seq data, confirming the relevance of the metrics employed. Future developments in RNA-seq alignment methods would benefit from improved placement of multimapped reads, balanced utilization of existing gene annotation and a reduced false discovery rate for splice junctions.

show abstract

Highly parallel direct RNA sequencing on an array of nanopores

Garalde

Snell

Jachimowicz

et al. 2016

Preprint

203

248

View full text Add to dashboard Cite

Ribonucleic acid sequencing can allow us to monitor the RNAs present in a sample. This enables us to detect the presence and nucleotide sequence of viruses, or to build a picture of how active transcriptional processes are changing – information that is useful for understanding the status and function of a sample. Oxford Nanopore Technologies’ sequencing technology is capable of electronically analysing a sample’s DNA directly, and in real-time. In this manuscript we demonstrate the ability of an array of nanopores to sequence RNA directly, and we apply it to a range of biological situations. Nanopore technology is the only available sequencing technology that can sequence RNA directly, rather than depending on reverse transcription and PCR. There are several potential advantages of this approach over other RNA-seq strategies, including the absence of amplification and reverse transcription biases, the ability to detect nucleotide analogues and the ability to generate full-length, strand-specific RNA sequences. Direct RNA sequencing is a completely new way of analysing the sequence of RNA samples and it will improve the ease and speed of RNA analysis, while yielding richer biological information.

show abstract

Sessile hemocytes as a hematopoietic compartment in Drosophila melanogaster

Márkus

Laurinyecz

Kurucz

et al. 2009

Proc. Natl. Acad. Sci. U.S.A.

211

233

View full text Add to dashboard Cite

The blood cells, or hemocytes, in Drosophila participate in the immune response through the production of antimicrobial peptides, the phagocytosis of bacteria, and the encapsulation of larger foreign particles such as parasitic eggs; these immune reactions are mediated by phylogenetically conserved mechanisms. The encapsulation reaction is analogous to the formation of granuloma in vertebrates, and is mediated by large specialized cells, the lamellocytes. The origin of the lamellocytes has not been formally established, although it has been suggested that they are derived from the lymph gland, which is generally considered to be the main hematopoietic organ in the Drosophila larva. However, it was recently observed that a subepidermal population of sessile blood cells is released into the circulation in response to a parasitoid wasp infection. We set out to analyze this phenomenon systematically. As a result, we define the sessile hemocytes as a novel hematopoietic compartment, and the main source of lamellocytes. cellular immunity ͉ lamellocytes ͉ parasitoid wasp ͉ plasmatocytes ͉ niche

show abstract

Phylogenetic Quantification of Intra-tumour Heterogeneity

et al. 2014

View full text Add to dashboard Cite

Intra-tumour genetic heterogeneity is the result of ongoing evolutionary change within each cancer. The expansion of genetically distinct sub-clonal populations may explain the emergence of drug resistance, and if so, would have prognostic and predictive utility. However, methods for objectively quantifying tumour heterogeneity have been missing and are particularly difficult to establish in cancers where predominant copy number variation prevents accurate phylogenetic reconstruction owing to horizontal dependencies caused by long and cascading genomic rearrangements. To address these challenges, we present MEDICC, a method for phylogenetic reconstruction and heterogeneity quantification based on a Minimum Event Distance for Intra-tumour Copy-number Comparisons. Using a transducer-based pairwise comparison function, we determine optimal phasing of major and minor alleles, as well as evolutionary distances between samples, and are able to reconstruct ancestral genomes. Rigorous simulations and an extensive clinical study show the power of our method, which outperforms state-of-the-art competitors in reconstruction accuracy, and additionally allows unbiased numerical quantification of tumour heterogeneity. Accurate quantification and evolutionary inference are essential to understand the functional consequences of tumour heterogeneity. The MEDICC algorithms are independent of the experimental techniques used and are applicable to both next-generation sequencing and array CGH data.

show abstract

Exome sequencing identifies NBEAL2 as the causative gene for gray platelet syndrome

et al. 2011

View full text Add to dashboard Cite

Gray platelet syndrome (GPS) is a predominantly recessive platelet disorder characterized by a mild thrombocytopenia with large platelets and a paucity of α-granules; these abnormalities cause mostly moderate but in rare cases severe bleeding. We sequenced the exomes of four unrelated cases and identified as the causative gene NBEAL2, a gene with previously unknown function but a member of a gene family involved in granule development. Silencing of nbeal2 in zebrafish abrogated thrombocyte formation.

show abstract

SMIM1 underlies the Vel blood group and influences red blood cell traits

et al. 2013

View full text Add to dashboard Cite

+ Correspondence should be addressed to CAA (c.albers@gen.umcn.nl), WHO (who1000@cam.ac.uk) or AC (as889@cam.ac.uk) . Author Contributions: AC performed zebrafish knock down, analysis of zebrafish gene sequence; LHW, collected clinical cases with anti-Vel, performed confirmatory Sanger sequencing and phenotyping by flow cytometry and haem-agglutination; JCS performed confirmatory Sanger sequencing and analyzed the genotyping data; MK and PB analyzed the RNA-Sequencing data; PAS performed SMIM1 transfection experiments, MF and SF performed isolation of precursor cells; BS, GJ, AT and NG performed the analysis of the evolutionary conservation of the SMIM genes; AAS performed genotyping; EA, erythroblast culture and transfection; EB performed zebrafish knock down experiment with input from DS; HS, HHWS, VGH, NV performed cell culture experiments and performed EMSA's and transfection experiments and Q-PCR for SMIM1; RSNF, JK, HJW and LF performed eQTL and gene ontology analysis; AG, MN, JP, JGS, HLJ, KR, MdH were responsible for identification of Vel-negative and Vel-weak individuals by typing >360,000 samples; HHDK performed RNA-Seq with supervisory input from HGS who leads and coordinates the BluePrint epigenome project; GK supervised exome-sequencing; AR analysed expression data from whole genome expression arrays and RNAseq; HS expression data and vectors; DS iron homeostasis and other relevant laboratory measurements; D.St. oversaw zebrafish experiments. NS provided pre-publication access to red blood cell GWAS meta-analysis; PH eQTL analysis, expression data, SMIM1 vectors, pre-publication access to red blood cell GWAS meta-analysis; EvdS and WHO designed the study, CAA performed exome sequence analysis, Sanger sequence analysis, genetic analysis and statistical analysis; AC, LHW, EvdS, WHO and CAA wrote the paper.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

334 Leonard St

Brooklyn, NY 11211

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Botond Sipos

Towards practical, high-capacity, low-maintenance information storage in synthesized DNA

Highly parallel direct RNA sequencing on an array of nanopores

Systematic evaluation of spliced alignment programs for RNA-seq data

Highly parallel direct RNA sequencing on an array of nanopores

Sessile hemocytes as a hematopoietic compartment in Drosophila melanogaster

Phylogenetic Quantification of Intra-tumour Heterogeneity

Exome sequencing identifies NBEAL2 as the causative gene for gray platelet syndrome

SMIM1 underlies the Vel blood group and influences red blood cell traits

Contact Info

Product

Resources

About