2022
DOI: 10.3390/biotech11030031
|View full text |Cite
|
Sign up to set email alerts
|

Bio-Strings: A Relational Database Data-Type for Dealing with Large Biosequences

Abstract: DNA sequencers output a large set of very long biological data strings that we should persist in databases rather than basic text file systems. Many different data models and database management systems (DBMS) may deal with both storage and efficiency issues regarding genomic datasets. Specifically, there is a need for handling strings with variable sizes while keeping their biological meaning. Relational database management systems (RDBMS) provide several data types that could be further explored for the geno… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
10
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
4
1

Relationship

0
9

Authors

Journals

citations
Cited by 16 publications
(14 citation statements)
references
References 21 publications
(21 reference statements)
0
10
0
Order By: Relevance
“…For GC content analysis, a custom R function, calculate-GC-Content, was developed using the ‘Biostrings’ package 89 . This function read the sequences from FASTA files and calculated the GC content by aggregating guanine and cytosine nucleotide counts across all sequences, measuring the genomic GC proportion of the bacteria.…”
Section: Methodsmentioning
confidence: 99%
“…For GC content analysis, a custom R function, calculate-GC-Content, was developed using the ‘Biostrings’ package 89 . This function read the sequences from FASTA files and calculated the GC content by aggregating guanine and cytosine nucleotide counts across all sequences, measuring the genomic GC proportion of the bacteria.…”
Section: Methodsmentioning
confidence: 99%
“… 25 https://doi.org/10.1093/bioinformatics/bty633 Biostrings, v2.66.0 Lifschitz et al. 26 https://doi.org/10.18129/B9.bioc.Biostrings GGally, v2.1.2 Schloerke et al. 27 https://ggobi.github.io/ggally/ ggseqlogo, v0.1 Wagih et al.…”
Section: Key Resources Tablementioning
confidence: 99%
“…Variant-transcript pairs with a PTC conforming to any of the above rules will be annotated to escape NMD, but results for all rules are reported individually by aenmd; this allows users to focus on subsets of rules, if desired. aenmd is implemented in the R programming language [45], making use of the VariantAnnotation [46] and vcfR [47] packages for importing and exporting variants from vcf files, and the Biostrings [48] and Ge-nomicRanges [49] packages for calculating rules. An index containing all PTC-generating SNVs is pre-calculated for a given transcript set and stored in a trie data structure for lookup, using the triebeard package.…”
Section: Annotating Escape From Nmdmentioning
confidence: 99%