2014
DOI: 10.1186/1471-2105-15-66
|View full text |Cite
|
Sign up to set email alerts
|

Information compression exploits patterns of genome composition to discriminate populations and highlight regions of evolutionary interest

Abstract: BackgroundGenomic information allows population relatedness to be inferred and selected genes to be identified. Single nucleotide polymorphism microarray (SNP-chip) data, a proxy for genome composition, contains patterns in allele order and proportion. These patterns can be quantified by compression efficiency (CE). In principle, the composition of an entire genome can be represented by a CE number quantifying allele representation and order.ResultsWe applied a compression algorithm (DEFLATE) to genome-wide hi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
21
0

Year Published

2017
2017
2020
2020

Publication Types

Select...
4
2

Relationship

1
5

Authors

Journals

citations
Cited by 14 publications
(21 citation statements)
references
References 77 publications
0
21
0
Order By: Relevance
“…Reads were subsampled with probability 0.5, 0.25, 0.1, 0.05, 0.01, 0.005, 0.002 or 0.001; representing sequencing 2, 4, 10, 20, 100, 200, 500 or 1000 times the number of samples per lane, respectively. The set of sampled reads at a given simulated sequencing depth were then used to calculate repeatability, as above, or compression efficiency (29). Compression efficiency compares the size of a compressed file to its original size as (original – compressed)/original and is a measure of the non-redundant information present in the file.…”
Section: Methodsmentioning
confidence: 99%
“…Reads were subsampled with probability 0.5, 0.25, 0.1, 0.05, 0.01, 0.005, 0.002 or 0.001; representing sequencing 2, 4, 10, 20, 100, 200, 500 or 1000 times the number of samples per lane, respectively. The set of sampled reads at a given simulated sequencing depth were then used to calculate repeatability, as above, or compression efficiency (29). Compression efficiency compares the size of a compressed file to its original size as (original – compressed)/original and is a measure of the non-redundant information present in the file.…”
Section: Methodsmentioning
confidence: 99%
“…Reads were subsampled with probability 0.5, 0.25, 0.1, 0.05, 0.01, 0.005, 0.002 or 0.001 using the sample function in R; representing sequencing 2, 4, 10, 20, 100, 200, 500 or 1000 times the number of samples per lane, respectively. The set of sampled reads at a given simulated sequencing depth were then used to calculate compression efficiency [31]. Compression efficiency compares the size of a compressed file to its original size as (originalcompressed)/original and is a measure of the non-redundant information present in the file.…”
Section: Sensitivity To Sequencing Depthmentioning
confidence: 99%
“…In current study, selection signatures were identified near several known color genes, including KITLG (near SNP BTA-74300-no-rs on BTA5) 38,54 , and LEF1 32,55 (here indicated by a peak in the XtX GWAS on BTA6). These genes and another candidate gene MCM6 (near ARS-BFGL-NGS-92772 on BTA2, also identified by Hudson et al 50 ) overlap with pigmentation QTL regions underlying UV-protection 56 . The environmental PC1 signal near IGFBP7 and the combined XtX-morphological signal near ADRA1D (Table 2) are close to the coat color genes KIT 56,57 and ATRN 56,58 , respectively.…”
Section: Discussionmentioning
confidence: 57%
“…In other studies, strong signals of selection in tropical cattle have been detected on BTA5 32,34,49,50 . Notably, Porto-Neto et al identified a 20 Mb region on BTA5 with effects on parasite resistance, yearling weight, body condition score, coat color and penile sheath score 32 .…”
Section: Discussionmentioning
confidence: 77%