2018
DOI: 10.1093/bioinformatics/bty125
|View full text |Cite
|
Sign up to set email alerts
|

AltHapAlignR: improved accuracy of RNA-seq analyses through the use of alternative haplotypes

Abstract: MotivationReliance on mapping to a single reference haplotype currently limits accurate estimation of allele or haplotype-specific expression using RNA-sequencing, notably in highly polymorphic regions such as the major histocompatibility complex.ResultsWe present AltHapAlignR, a method incorporating alternate reference haplotypes to generate gene- and haplotype-level estimates of transcript abundance for any genomic region where such information is available. We validate using simulated and experimental data … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
46
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
4
3
1

Relationship

1
7

Authors

Journals

citations
Cited by 31 publications
(48 citation statements)
references
References 38 publications
2
46
0
Order By: Relevance
“…This may be explained by different techniques (RNA-seq vs. qPCR), the samples used, or the fact that imputation of expression was used for heterozygous genotypes in previous studies [13,18]. Even in a comparison with [33], which also reanalyzed the GEUVADIS dataset with an HLA-personalized approach, we observe only a moderate concordance in the ordering of lineages by expression level. This is likely due to several differences between our approach and theirs, including the strategy for alignment ([33] use the 8 MHC region haplotypes to guide quantification, while we directly use the entire known HLA diversity for this task), and the treatment of reads mapping to multiple loci or alleles ( [33] discard these reads, while we use a maximum likelihood approach to measure their contribution).…”
Section: Hla Allele-level Analysismentioning
confidence: 62%
See 2 more Smart Citations
“…This may be explained by different techniques (RNA-seq vs. qPCR), the samples used, or the fact that imputation of expression was used for heterozygous genotypes in previous studies [13,18]. Even in a comparison with [33], which also reanalyzed the GEUVADIS dataset with an HLA-personalized approach, we observe only a moderate concordance in the ordering of lineages by expression level. This is likely due to several differences between our approach and theirs, including the strategy for alignment ([33] use the 8 MHC region haplotypes to guide quantification, while we directly use the entire known HLA diversity for this task), and the treatment of reads mapping to multiple loci or alleles ( [33] discard these reads, while we use a maximum likelihood approach to measure their contribution).…”
Section: Hla Allele-level Analysismentioning
confidence: 62%
“…For both first and second steps of our pipeline a read can align to more than one allele or gene (due to their sequence level similarity). Instead of discarding such reads (as in [33]) or evenly splitting them among the compatible references (as in [30]), we use maximum likelihood estimates of expression obtained by an expectation-maximization (EM) algorithm, which is implemented within Salmon and kallisto. This procedure probabilistically assigns reads to each reference in the index, in a way that accounts for reads that align to more than one gene or allele [35,36].…”
Section: Hla Expression Quantification From Rna-seq Datamentioning
confidence: 99%
See 1 more Smart Citation
“…There were 146 expressed genes in the MHC interval (TPM >2 in at least 10 samples), of which 24 were HLA genes, including six HLA class I genes , 3 HLA class I pseudogenes, 11 HLA class II genes, MICA, MICB, TAP1 and TAP2. For each individual, we estimated expression levels of the 24 HLA genes by aligning reads to cDNA sequences specific for the HLA types called by HLA-VBSeq(Nariai et al, 2015) to avoid alignment biases (Aguiar et al, 2019;Gensterblum-Miller et al, 2018;Lee et al, 2018;Panousis et al, 2014) Supplementary File 4). The HLA class I genes tended to be expressed at high levels, consistent with their ubiquitous expression in all cell types, and the HLA class II genes tended to be expressed at lower levels, as expected due to their primary role in immune cells (Matzaraki et al, 2017).…”
Section: Eight-digit Hla Types Associated With Expression Of Cognate mentioning
confidence: 99%
“…However, this approach is limited as there is no ground truth in real data to help with optimisation, and increasing the number of reads aligned will also result in an increase in the number of false positive reads. Another strategy for solving the false-negative non-alignment problem is by incorporating variation information during alignment, in the form of utilising alternate loci sequences within the reference genome [15] or integration of a single nucleotide polymorphism database to the reference [13], to help minimise the effect of divergence of the personal genome compared to the reference genome. This approach is also limited as it requires existing variation information, which may not be available in non-model organisms.…”
Section: Introductionmentioning
confidence: 99%