Justin Miller scite author profile

BackgroundAnalyzing next-generation sequencing data is difficult because datasets are large, second generation sequencing platforms have high error rates, and because each position in the target genome (exome, transcriptome, etc.) is sequenced multiple times. Given these challenges, numerous bioinformatic algorithms have been developed to analyze these data. These algorithms aim to find an appropriate balance between data loss, errors, analysis time, and memory footprint. Typical analysis pipelines require multiple steps. If one or more of these steps is unnecessary, it would significantly decrease compute time and data manipulation to remove the step. One step in many pipelines is PCR duplicate removal, where PCR duplicates arise from multiple PCR products from the same template molecule binding on the flowcell. These are often removed because there is concern they can lead to false positive variant calls. Picard (MarkDuplicates) and SAMTools (rmdup) are the two main softwares used for PCR duplicate removal.ResultsApproximately 92 % of the 17+ million variants called were called whether we removed duplicates with Picard or SAMTools, or left the PCR duplicates in the dataset. There were no significant differences between the unique variant sets when comparing the transition/transversion ratios (p = 1.0), percentage of novel variants (p = 0.99), average population frequencies (p = 0.99), and the percentage of protein-changing variants (p = 1.0). Results were similar for variants in the American College of Medical Genetics genes. Genotype concordance between NGS and SNP chips was above 99 % for all genotype groups (e.g., homozygous reference).ConclusionsOur results suggest that PCR duplicate removal has minimal effect on the accuracy of subsequent variant calls.

show abstract

Genetic analysis, structural modeling, and direct coupling analysis suggest a mechanism for phosphate signaling in Escherichia coli

Gardner

Miller

Dean

et al. 2015

BMC Genet

View full text Add to dashboard Cite

BackgroundProper phosphate signaling is essential for robust growth of Escherichia coli and many other bacteria. The phosphate signal is mediated by a classic two component signal system composed of PhoR and PhoB. The PhoR histidine kinase is responsible for phosphorylating/dephosphorylating the response regulator, PhoB, which controls the expression of genes that aid growth in low phosphate conditions. The mechanism by which PhoR receives a signal of environmental phosphate levels has remained elusive. A transporter complex composed of the PstS, PstC, PstA, and PstB proteins as well as a negative regulator, PhoU, have been implicated in signaling environmental phosphate to PhoR.ResultsThis work confirms that PhoU and the PstSCAB complex are necessary for proper signaling of high environmental phosphate. Also, we identify residues important in PhoU/PhoR interaction with genetic analysis. Using protein modeling and docking methods, we show an interaction model that points to a potential mechanism for PhoU mediated signaling to PhoR to modify its activity. This model is tested with direct coupling analysis.ConclusionsThese bioinformatics tools, in combination with genetic and biochemical analysis, help to identify and test a model for phosphate signaling and may be applicable to several other systems.

show abstract

Arabidopsis thaliana organelles mimic the T7 phage DNA replisome with specific interactions between Twinkle protein and DNA polymerases Pol1A and Pol1B

et al. 2019

View full text Add to dashboard Cite

Background Plant chloroplasts and mitochondria utilize nuclear encoded proteins to replicate their DNA. These proteins are purposely built for replication in the organelle environment and are distinct from those involved in replication of the nuclear genome. These organelle-localized proteins have ancestral roots in bacterial and bacteriophage genes, supporting the endosymbiotic theory of their origin. We examined the interactions between three of these proteins from Arabidopsis thaliana : a DNA helicase-primase similar to bacteriophage T7 gp4 protein and animal mitochondrial Twinkle, and two DNA polymerases, Pol1A and Pol1B. We used a three-pronged approach to analyze the interactions, including Yeast-two-hybrid analysis, Direct Coupling Analysis (DCA), and thermophoresis. Results Yeast-two-hybrid analysis reveals residues 120–295 of Twinkle as the minimal region that can still interact with Pol1A or Pol1B. This region is a part of the primase domain of the protein and slightly overlaps the zinc-finger and RNA polymerase subdomains located within. Additionally, we observed that Arabidopsis Twinkle interacts much more strongly with Pol1A versus Pol1B. Thermophoresis also confirms that the primase domain of Twinkle has higher binding affinity than any other region of the protein. Direct-Coupling-Analysis identified specific residues in Twinkle and the DNA polymerases critical to positive interaction between the two proteins. Conclusions The interaction of Twinkle with Pol1A or Pol1B mimics the minimal DNA replisomes of T7 phage and those present in mammalian mitochondria. However, while T7 and mammals absolutely require their homolog of Twinkle DNA helicase-primase, Arabidopsis Twinkle mutants are seemingly unaffected by this loss. This implies that while Arabidopsis mitochondria mimic minimal replisomes from T7 and mammalian mitochondria, there is an extra level of redundancy specific to loss of Twinkle function. Electronic supplementary material The online version of this article (10.1186/s12870-019-1854-3) contains supplementary material, which is available to authorized users.

show abstract

Alzheimer's disease alters oligodendrocytic glycolytic and ketolytic gene expression

Saito

Miller

Harari

et al. 2021

Alzheimer's & Dementia

View full text Add to dashboard Cite

Introduction Sporadic Alzheimer's disease (AD) is strongly correlated with impaired brain glucose metabolism, which may affect AD onset and progression. Ketolysis has been suggested as an alternative pathway to fuel the brain. Methods RNA‐seq profiles of post mortem AD brains were used to determine whether dysfunctional AD brain metabolism can be determined by impairments in glycolytic and ketolytic gene expression. Data were obtained from the Knight Alzheimer's Disease Research Center (62 cases; 13 controls), Mount Sinai Brain Bank (110 cases; 44 controls), and the Mayo Clinic Brain Bank (80 cases; 76 controls), and were normalized to cell type: astrocytes, microglia, neurons, oligodendrocytes. Results In oligodendrocytes, both glycolytic and ketolytic pathways were significantly impaired in AD brains. Ketolytic gene expression was not significantly altered in neurons, astrocytes, and microglia. Discussion Oligodendrocytes may contribute to brain hypometabolism observed in AD. These results are suggestive of a potential link between hypometabolism and dysmyelination in disease physiology. Additionally, ketones may be therapeutic in AD due to their ability to fuel neurons despite impaired glycolytic metabolism.

show abstract

The Pathology of Rapid Cognitive Decline in Clinically Diagnosed Alzheimer’s Disease

Nance

Ritter

Miller

et al. 2019

JAD

View full text Add to dashboard Cite

Background: Variable rate of cognitive decline among individuals with Alzheimer's disease (AD) is an important consideration for disease management, but risk factors for rapid cognitive decline (RCD) are without consensus. Objective: To investigate demographic, clinical, and pathological differences between RCD and normal rates of cognitive decline (NCD) in AD. Methods: Neuropsychology test and autopsy data was pulled from the National Alzheimer's Coordinating Center database from individuals with a clinical diagnosis of AD. Individuals with average decline of 3 or more points on the Mini-Mental Status Examination (MMSE) per year over 3 years were labeled RCD; all others were NCD. Results: Sixty individuals identified as RCD; 230 as NCD. These neuropsychology tests differed at baseline (RCD versus NCD): WMS-LM Immediate Recall (4.35[3.39] versus 6.31[3.97], p < 0.001), Animal Naming (12.1[4.83] versus 13.9[4.83], p = 0.007), TMT Part B (187[86.1] versus 159[79.0], p = 0.02), WAIS-Digit Symbol (29.5[11.3] versus 29.5[11.3], p = 0.04), and the BNT (21.5[7.05] versus 23.6[5.09], p = 0.04). RCD had more thyroid disease (30% versus 16%, p = 0.01) and greater usage of AD medication at baseline (80% versus 62%, p = 0.01). RCD had more severe cerebral amyloid angiopathy 1.62[1.0] versus 1.13[1.0], p = 0.002), more neocortical Lewy bodies (20% versus 10%, p = 0.04), and more atrophy (1.54[0.92] versus 1.17[0.83], p = 0.04). A model combining select variables was significant above chance (χ 2 = 25.8, p = 0.002), but not to clinical utility (AUC < 0.70; 95% CI). Conclusion: Individuals with RCD have more severe pathology, more comorbidities, and lower baseline neuropsychology test scores of language and executive function.

show abstract

Missing something? Codon aversion as a new character system in phylogenetics

et al. 2017

View full text Add to dashboard Cite

Although many studies have documented codon usage bias in different species, the importance of codon usage in a phylogenetic framework remains largely unknown. We demonstrate that a phylogenetic signal is present in the codon usage and non‐usage biases of 17 717 orthologues evaluated across 72 tetrapod species using a simple parsimony analysis of a binary matrix of codon characters. Phylogenies estimated using stop codons were more congruent with previous hypotheses than phylogenies based on any other single codon or a combination of codons. Although each codon is present in every species, specific genes have different codon preferences and may or may not use every possible codon. This observation allowed us to map the pattern of codon usage and non‐usage across the topology. These results suggest that codon usage is phylogenetically conserved across shallow and deep levels within tetrapods.

show abstract

ExtRamp: a novel algorithm for extracting the ramp sequence based on the tRNA adaptation index or relative codon adaptiveness

Miller

Brase

Ridge

2019

View full text Add to dashboard Cite

Different species, genes, and locations within genes use different codons to fine-tune gene expression. Within genes, the ramp sequence assists in ribosome spacing and decreases downstream collisions by incorporating slowly-translated codons at the beginning of a gene. Although previously reported as occurring in some species, no previous attempt at extracting the ramp sequence from specific genes has been published. We present ExtRamp, a software package that quickly extracts ramp sequences from any species using the tRNA adaptation index or relative codon adaptiveness. Different filters facilitate the analysis of codon efficiency and enable identification of genes with a ramp sequence. We validate the existence of a ramp sequence in most species by running ExtRamp on 229 742 339 genes across 23 428 species. We evaluate differences in reported ramp sequences when we use different parameters. Using the strictest ramp sequence cut-off, we show that across most taxonomic groups, ramp sequences are approximately 20–40 codons long and occur in about 10% of gene sequences. We also show that in Drosophila melanogaster as gene expression increases, a higher proportion of genes have ramp sequences. We provide a framework for performing this analysis on other species. ExtRamp is freely available at https://github.com/ridgelab/ExtRamp.

show abstract

Human viruses have codon usage biases that match highly expressed proteins in the tissues they infect

Miller¹,

Hippen²,

Wright³

et al. 2017

Biomed Genet Genomics

View full text Add to dashboard Cite

It is well-documented that codon usage biases affect gene translational efficiency; however, it is less known if viruses share their host's codon usage motifs. We determined that human-infecting viruses share similar codon usage biases as proteins that are expressed in tissues the viruses infect. By performing 7,052,621 pairwise comparisons of genes from humans versus genes from 113 viruses that infect humans, we determined which codon usage motifs were most highly correlated. We found that 16 viruses averaged a significant correlation in codon usage with over 500 human genes per viral gene, 58 viruses were highly correlated with an average of at least 100 human genes per viral gene, and 37 viruses were significantly correlated with an average of at least one human gene per viral gene at an alpha level of 7.09 x (0.05 alpha / 7,052,621 comparisons). Only two viruses were not highly correlated with an average of one human gene per viral gene. While relatively few of the interactions were previously documented, the high statistical correlations suggest that researchers may be able to determine which tissues a virus is most likely to infect by analyzing codon usage biases.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Justin Miller

Evaluating the necessity of PCR duplicate removal from next-generation sequencing data and a comparison of approaches

Genetic analysis, structural modeling, and direct coupling analysis suggest a mechanism for phosphate signaling in Escherichia coli

Arabidopsis thaliana organelles mimic the T7 phage DNA replisome with specific interactions between Twinkle protein and DNA polymerases Pol1A and Pol1B

Alzheimer's disease alters oligodendrocytic glycolytic and ketolytic gene expression

The Pathology of Rapid Cognitive Decline in Clinically Diagnosed Alzheimer’s Disease

Missing something? Codon aversion as a new character system in phylogenetics

ExtRamp: a novel algorithm for extracting the ramp sequence based on the tRNA adaptation index or relative codon adaptiveness

Human viruses have codon usage biases that match highly expressed proteins in the tissues they infect

Contact Info

Product

Resources

About