Resolving Conflicting Predictions from Multimapping Reads

Canzar, Stefan; Elbassioni, Khaled; Jones, Mitchell; Mestre, Julián

doi:10.1089/cmb.2015.0164

Cited by 4 publications

(4 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Therefore, the number of reads in the sample remained the same. Multi-mapping reads present a problem for downstream analyses as they reduce sensitivity (Canzar et al, 2016). Although several strategies and specialized tools were proposed to count multi-mapping reads (Roberts et al, 2013;Zytnicki, 2017), for our purposes, specificity was the principal issue.…”

Section: Discussionmentioning

confidence: 99%

Identification and Validation of Reference Genes in Clostridium beijerinckii NRRL B-598 for RT-qPCR Using RNA-Seq Data

et al. 2021

View full text Add to dashboard Cite

Gene expression analysis through reverse transcription-quantitative real-time polymerase chain reaction (RT-qPCR) depends on correct data normalization by reference genes with stable expression. Although Clostridium beijerinckii NRRL B-598 is a promising Gram-positive bacterium for the industrial production of biobutanol, validated reference genes have not yet been reported. In this study, we selected 160 genes with stable expression based on an RNA sequencing (RNA-Seq) data analysis, and among them, seven genes (zmp, rpoB1, rsmB, greA, rpoB2, topB2, and rimO) were selected for experimental validation by RT-qPCR and gene ontology (GO) enrichment analysis. According to statistical analyses, zmp and greA were the most stable and suitable reference genes for RT-qPCR normalization. Furthermore, our methodology can be useful for selection of the reference genes in other strains of C. beijerinckii and it also suggests that the RNA-Seq data can be used for the initial selection of novel reference genes, however, their validation is required.

show abstract

Section: Discussionmentioning

confidence: 99%

Identification and Validation of Reference Genes in Clostridium beijerinckii NRRL B-598 for RT-qPCR Using RNA-Seq Data

et al. 2021

View full text Add to dashboard Cite

show abstract

“…Importantly, the hardness of MFLP (and thus sequence alignment in this form) can be demonstrated by reduction from maximum coverage (Canzar et al, 2016). This reduction also preserves approximation bounds.…”

Section: A New Theory For Genomic Deletionmentioning

confidence: 95%

“…A practical limitation of theoretical modeling is highlighted by our analysis of the sequence alignment problem (Canzar et al, 2016). As the hardness result for this problem is at least 0:63, one might think all hope is lost.…”

Section: Impossibly Good Algorithms?mentioning

confidence: 99%

“…We have recently shown that sequence alignment can be modeled as an instance of the maximum facility location problem (MFLP) (Canzar et al, 2016). In brief, given a bipartite graph of putative mappings from the donor genome (set of clients, C) to the reference genome (set of facilities, F), the weight w u‚ v of an edge from a client, u 2 C, to facility, v 2 F, is equivalent to our confidence in that deletion.…”

Section: A New Theory For Genomic Deletionmentioning

confidence: 99%

See 1 more Smart Citation

Cautionary Tales of Inapproximability

Budden

Jones

2017

Journal of Computational Biology

Self Cite

View full text Add to dashboard Cite

Modeling biology as classical problems in computer science allows researchers to leverage the wealth of theoretical advancements in this field. Despite countless studies presenting heuristics that report improvement on specific benchmarking data, there has been comparatively little focus on exploring the theoretical bounds on the performance of practical (polynomial-time) algorithms. Conversely, theoretical studies tend to overstate the generalizability of their conclusions to physical biological processes. In this article we provide a fresh perspective on the concepts of NP-hardness and inapproximability in the computational biology domain, using popular sequence assembly and alignment (mapping) algorithms as illustrative examples. These algorithms exemplify how computer science theory can both (a) lead to substantial improvement in practical performance and (b) highlight areas ripe for future innovation. Importantly, we discuss caveats that seemingly allow the performance of heuristics to exceed their provable bounds.Keywords: algorithms, inapproximability, genomics, alignment. SEQUENCE ASSEMBLY: WHERE THEORY MEETS PRACTICEG iven a set of n strings, S = fs 1 ; s 2 ; . . . ‚ s n g, the goal of the shortest common superstring problem (SCSP) is to find the minimum length string, s, such that each s i 2 S is a substring of s. The SCSP over the nucleotide alphabet, S = fA‚ C‚ G‚ Tg, thus provides a simple and convenient model for the sequence assembly problem, whereby we wish to determine the DNA sequence from which a set of reads (or k-mers) are derived. This is a classic example of how decades of research on approximation bounds of NP-hard problems can be applied to improve the practical performance of algorithms in the computational biology domain.A detailed review on the development of approximation and hardness bounds for SCSP is provided by Golovnev et al. (2013). Despite these advancements, the power of theoretical computer science abstractions is limited by how closely they represent the true biological problem (as we discuss later)-in this case, reversing the DNA fragmentation process inherent to high-throughput sequencing experiments. SCSP and its sequence assembly derivatives (Sweedyk, 2000;Kaplan and Shafrir, 2005) have thus been criticized for their assumptions regarding parsimony and tandem repeats (Nagarajan and Pop, 2009), motivating the application of graph theoretic models that make more appropriate sets of assumptions.

show abstract

Identification of Genomic Somatic Variants in Cancer

Fawcett

Eterovic

2017

Advances in Clinical Chemistry

View full text Add to dashboard Cite

Resolving Conflicting Predictions from Multimapping Reads

Cited by 4 publications

References 20 publications

Identification and Validation of Reference Genes in Clostridium beijerinckii NRRL B-598 for RT-qPCR Using RNA-Seq Data

Identification and Validation of Reference Genes in Clostridium beijerinckii NRRL B-598 for RT-qPCR Using RNA-Seq Data

Cautionary Tales of Inapproximability

Identification of Genomic Somatic Variants in Cancer

Contact Info

Product

Resources

About