2019
DOI: 10.1093/nar/gkz750
|View full text |Cite
|
Sign up to set email alerts
|

Training-free measures based on algorithmic probability identify high nucleosome occupancy in DNA sequences

Abstract: We introduce and study a set of training-free methods of an information-theoretic and algorithmic complexity nature that we apply to DNA sequences to identify their potential to identify nucleosomal binding sites. We test the measures on well-studied genomic sequences of different sizes drawn from different sources. The measures reveal the known in vivo versus in vitro predictive discrepancies and uncover their potential to pinpoint high and low nucleosome occupancy. We explore different possible signals withi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
28
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
3
3
1

Relationship

3
4

Authors

Journals

citations
Cited by 16 publications
(29 citation statements)
references
References 52 publications
1
28
0
Order By: Relevance
“…Indeed, when applied to short strings, LZW and other statistical compression algorithms would fail, so they are found to be of very limited use in the short string regime. For example, in an approach that tested and compared compression in a problem of molecular biology, it was shown that CTM and BDM can be informative of nucleosome occupancy [ 41 ] when applied to short strings (a nucleosome of DNA is only around 146 base pairs long), a key feature in genetic regulation. The problem of nucleosome occupancy is one of the greatest challenges in molecular biology, believed to be second only to protein folding.…”
Section: Discussionmentioning
confidence: 99%
See 2 more Smart Citations
“…Indeed, when applied to short strings, LZW and other statistical compression algorithms would fail, so they are found to be of very limited use in the short string regime. For example, in an approach that tested and compared compression in a problem of molecular biology, it was shown that CTM and BDM can be informative of nucleosome occupancy [ 41 ] when applied to short strings (a nucleosome of DNA is only around 146 base pairs long), a key feature in genetic regulation. The problem of nucleosome occupancy is one of the greatest challenges in molecular biology, believed to be second only to protein folding.…”
Section: Discussionmentioning
confidence: 99%
“…CTM offered the first ever classification and ranking of short strings (see Figure 3). This regime of short strings is of no small interest; they are among the most relevant for real-world applications such as perturbation analysis [ 40 ], molecular biology and genetics [ 41 ].…”
Section: Alternatives To Lossless Compressionmentioning
confidence: 99%
See 1 more Smart Citation
“…We have found that nucleosome organisation related features are likely to enhance model performance beyond the said five epigenetic markers, and therefore include here three base-pair resolved nucleosomal channels: W/S Score [15], Strong-Weak Nucleotide BDM (SW-BDM) [16], NuPoP (Occupancy) [17]. These scores showed the largest Spearman correlation 1 with cleavage activity when considering 13 distinct nucleosome organisation-related scores.…”
Section: Data Sourcementioning
confidence: 99%
“…The importance of nucleosome presence informed features for successful predictions of some of our best-performing models supports our emphasis on physically informed features. For a more detailed version of these features, we refer to [15,16,17].…”
Section: Data Sourcementioning
confidence: 99%