BackgroundAnalyzing next-generation sequencing data is difficult because datasets are large, second generation sequencing platforms have high error rates, and because each position in the target genome (exome, transcriptome, etc.) is sequenced multiple times. Given these challenges, numerous bioinformatic algorithms have been developed to analyze these data. These algorithms aim to find an appropriate balance between data loss, errors, analysis time, and memory footprint. Typical analysis pipelines require multiple steps. If one or more of these steps is unnecessary, it would significantly decrease compute time and data manipulation to remove the step. One step in many pipelines is PCR duplicate removal, where PCR duplicates arise from multiple PCR products from the same template molecule binding on the flowcell. These are often removed because there is concern they can lead to false positive variant calls. Picard (MarkDuplicates) and SAMTools (rmdup) are the two main softwares used for PCR duplicate removal.ResultsApproximately 92 % of the 17+ million variants called were called whether we removed duplicates with Picard or SAMTools, or left the PCR duplicates in the dataset. There were no significant differences between the unique variant sets when comparing the transition/transversion ratios (p = 1.0), percentage of novel variants (p = 0.99), average population frequencies (p = 0.99), and the percentage of protein-changing variants (p = 1.0). Results were similar for variants in the American College of Medical Genetics genes. Genotype concordance between NGS and SNP chips was above 99 % for all genotype groups (e.g., homozygous reference).ConclusionsOur results suggest that PCR duplicate removal has minimal effect on the accuracy of subsequent variant calls.
BackgroundProper phosphate signaling is essential for robust growth of Escherichia coli and many other bacteria. The phosphate signal is mediated by a classic two component signal system composed of PhoR and PhoB. The PhoR histidine kinase is responsible for phosphorylating/dephosphorylating the response regulator, PhoB, which controls the expression of genes that aid growth in low phosphate conditions. The mechanism by which PhoR receives a signal of environmental phosphate levels has remained elusive. A transporter complex composed of the PstS, PstC, PstA, and PstB proteins as well as a negative regulator, PhoU, have been implicated in signaling environmental phosphate to PhoR.ResultsThis work confirms that PhoU and the PstSCAB complex are necessary for proper signaling of high environmental phosphate. Also, we identify residues important in PhoU/PhoR interaction with genetic analysis. Using protein modeling and docking methods, we show an interaction model that points to a potential mechanism for PhoU mediated signaling to PhoR to modify its activity. This model is tested with direct coupling analysis.ConclusionsThese bioinformatics tools, in combination with genetic and biochemical analysis, help to identify and test a model for phosphate signaling and may be applicable to several other systems.
Background Plant chloroplasts and mitochondria utilize nuclear encoded proteins to replicate their DNA. These proteins are purposely built for replication in the organelle environment and are distinct from those involved in replication of the nuclear genome. These organelle-localized proteins have ancestral roots in bacterial and bacteriophage genes, supporting the endosymbiotic theory of their origin. We examined the interactions between three of these proteins from Arabidopsis thaliana : a DNA helicase-primase similar to bacteriophage T7 gp4 protein and animal mitochondrial Twinkle, and two DNA polymerases, Pol1A and Pol1B. We used a three-pronged approach to analyze the interactions, including Yeast-two-hybrid analysis, Direct Coupling Analysis (DCA), and thermophoresis. Results Yeast-two-hybrid analysis reveals residues 120–295 of Twinkle as the minimal region that can still interact with Pol1A or Pol1B. This region is a part of the primase domain of the protein and slightly overlaps the zinc-finger and RNA polymerase subdomains located within. Additionally, we observed that Arabidopsis Twinkle interacts much more strongly with Pol1A versus Pol1B. Thermophoresis also confirms that the primase domain of Twinkle has higher binding affinity than any other region of the protein. Direct-Coupling-Analysis identified specific residues in Twinkle and the DNA polymerases critical to positive interaction between the two proteins. Conclusions The interaction of Twinkle with Pol1A or Pol1B mimics the minimal DNA replisomes of T7 phage and those present in mammalian mitochondria. However, while T7 and mammals absolutely require their homolog of Twinkle DNA helicase-primase, Arabidopsis Twinkle mutants are seemingly unaffected by this loss. This implies that while Arabidopsis mitochondria mimic minimal replisomes from T7 and mammalian mitochondria, there is an extra level of redundancy specific to loss of Twinkle function. Electronic supplementary material The online version of this article (10.1186/s12870-019-1854-3) contains supplementary material, which is available to authorized users.
Introduction Sporadic Alzheimer's disease (AD) is strongly correlated with impaired brain glucose metabolism, which may affect AD onset and progression. Ketolysis has been suggested as an alternative pathway to fuel the brain. Methods RNA‐seq profiles of post mortem AD brains were used to determine whether dysfunctional AD brain metabolism can be determined by impairments in glycolytic and ketolytic gene expression. Data were obtained from the Knight Alzheimer's Disease Research Center (62 cases; 13 controls), Mount Sinai Brain Bank (110 cases; 44 controls), and the Mayo Clinic Brain Bank (80 cases; 76 controls), and were normalized to cell type: astrocytes, microglia, neurons, oligodendrocytes. Results In oligodendrocytes, both glycolytic and ketolytic pathways were significantly impaired in AD brains. Ketolytic gene expression was not significantly altered in neurons, astrocytes, and microglia. Discussion Oligodendrocytes may contribute to brain hypometabolism observed in AD. These results are suggestive of a potential link between hypometabolism and dysmyelination in disease physiology. Additionally, ketones may be therapeutic in AD due to their ability to fuel neurons despite impaired glycolytic metabolism.
Background: Variable rate of cognitive decline among individuals with Alzheimer's disease (AD) is an important consideration for disease management, but risk factors for rapid cognitive decline (RCD) are without consensus. Objective: To investigate demographic, clinical, and pathological differences between RCD and normal rates of cognitive decline (NCD) in AD. Methods: Neuropsychology test and autopsy data was pulled from the National Alzheimer's Coordinating Center database from individuals with a clinical diagnosis of AD. Individuals with average decline of 3 or more points on the Mini-Mental Status Examination (MMSE) per year over 3 years were labeled RCD; all others were NCD. Results: Sixty individuals identified as RCD; 230 as NCD. These neuropsychology tests differed at baseline (RCD versus NCD): WMS-LM Immediate Recall (4.35[3.39] versus 6.31[3.97], p < 0.001), Animal Naming (12.1[4.83] versus 13.9[4.83], p = 0.007), TMT Part B (187[86.1] versus 159[79.0], p = 0.02), WAIS-Digit Symbol (29.5[11.3] versus 29.5[11.3], p = 0.04), and the BNT (21.5[7.05] versus 23.6[5.09], p = 0.04). RCD had more thyroid disease (30% versus 16%, p = 0.01) and greater usage of AD medication at baseline (80% versus 62%, p = 0.01). RCD had more severe cerebral amyloid angiopathy 1.62[1.0] versus 1.13[1.0], p = 0.002), more neocortical Lewy bodies (20% versus 10%, p = 0.04), and more atrophy (1.54[0.92] versus 1.17[0.83], p = 0.04). A model combining select variables was significant above chance (χ 2 = 25.8, p = 0.002), but not to clinical utility (AUC < 0.70; 95% CI). Conclusion: Individuals with RCD have more severe pathology, more comorbidities, and lower baseline neuropsychology test scores of language and executive function.
Although many studies have documented codon usage bias in different species, the importance of codon usage in a phylogenetic framework remains largely unknown. We demonstrate that a phylogenetic signal is present in the codon usage and non‐usage biases of 17 717 orthologues evaluated across 72 tetrapod species using a simple parsimony analysis of a binary matrix of codon characters. Phylogenies estimated using stop codons were more congruent with previous hypotheses than phylogenies based on any other single codon or a combination of codons. Although each codon is present in every species, specific genes have different codon preferences and may or may not use every possible codon. This observation allowed us to map the pattern of codon usage and non‐usage across the topology. These results suggest that codon usage is phylogenetically conserved across shallow and deep levels within tetrapods.
Different species, genes, and locations within genes use different codons to fine-tune gene expression. Within genes, the ramp sequence assists in ribosome spacing and decreases downstream collisions by incorporating slowly-translated codons at the beginning of a gene. Although previously reported as occurring in some species, no previous attempt at extracting the ramp sequence from specific genes has been published. We present ExtRamp, a software package that quickly extracts ramp sequences from any species using the tRNA adaptation index or relative codon adaptiveness. Different filters facilitate the analysis of codon efficiency and enable identification of genes with a ramp sequence. We validate the existence of a ramp sequence in most species by running ExtRamp on 229 742 339 genes across 23 428 species. We evaluate differences in reported ramp sequences when we use different parameters. Using the strictest ramp sequence cut-off, we show that across most taxonomic groups, ramp sequences are approximately 20–40 codons long and occur in about 10% of gene sequences. We also show that in Drosophila melanogaster as gene expression increases, a higher proportion of genes have ramp sequences. We provide a framework for performing this analysis on other species. ExtRamp is freely available at https://github.com/ridgelab/ExtRamp.
It is well-documented that codon usage biases affect gene translational efficiency; however, it is less known if viruses share their host's codon usage motifs. We determined that human-infecting viruses share similar codon usage biases as proteins that are expressed in tissues the viruses infect. By performing 7,052,621 pairwise comparisons of genes from humans versus genes from 113 viruses that infect humans, we determined which codon usage motifs were most highly correlated. We found that 16 viruses averaged a significant correlation in codon usage with over 500 human genes per viral gene, 58 viruses were highly correlated with an average of at least 100 human genes per viral gene, and 37 viruses were significantly correlated with an average of at least one human gene per viral gene at an alpha level of 7.09 x (0.05 alpha / 7,052,621 comparisons). Only two viruses were not highly correlated with an average of one human gene per viral gene. While relatively few of the interactions were previously documented, the high statistical correlations suggest that researchers may be able to determine which tissues a virus is most likely to infect by analyzing codon usage biases.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.