Abstract:The amino acid content of the proteins encoded by a genome may predict the coding potential of that genome and may reflect lifestyle restrictions of the organism. Here, we calculated the Kullback–Leibler divergence from the mean amino acid content as a metric to compare the amino acid composition for a large set of bacterial and phage genome sequences. Using these data, we demonstrate that (i) there is a significant difference between amino acid utilization in different phylogenetic groups of bacteria and phag… Show more
“…3mc(S)+3mc(N)+3mc(M)+3mc(E) ) or applying other characterisation methods such as “protein- nv ” ( 37 ), graphical representation ( 38 ), and Fourier power spectrum ( 39 ). Furthermore, previous methods for phylogenetic reconstruction have used non-Euclidean distances such as Wasserstein ( 40 ), Kullback–Leibler ( 41 ), Yau–Hausdorff ( 38 ), or Structural Similarity Index Measure ( 42 ). Thus, applying these to dimensionality reduction algorithms might generate better representations of the genetic landscape.…”
Since its emergence in late 2019, SARS-CoV-2 has diversified into a large number of lineages and caused multiple waves of infection globally. Novel lineages have the potential to spread rapidly and internationally if they have higher intrinsic transmissibility and/or can evade host immune responses, as has been seen with the Alpha, Delta, and Omicron variants of concern. They can also cause increased mortality and morbidity if they have increased virulence, as was seen for Alpha and Delta. Phylogenetic methods provide the “gold standard” for representing the global diversity of SARS-CoV-2 and to identify newly emerging lineages. However, these methods are computationally expensive, struggle when datasets get too large, and require manual curation to designate new lineages. These challenges provide a motivation to develop complementary methods that can incorporate all of the genetic data available without down-sampling to extract meaningful information rapidly and with minimal curation. In this paper, we demonstrate the utility of using algorithmic approaches based on word-statistics to represent whole sequences, bringing speed, scalability, and interpretability to the construction of genetic topologies. While not serving as a substitute for current phylogenetic analyses, the proposed methods can be used as a complementary, and fully automatable, approach to identify and confirm new emerging variants.
“…3mc(S)+3mc(N)+3mc(M)+3mc(E) ) or applying other characterisation methods such as “protein- nv ” ( 37 ), graphical representation ( 38 ), and Fourier power spectrum ( 39 ). Furthermore, previous methods for phylogenetic reconstruction have used non-Euclidean distances such as Wasserstein ( 40 ), Kullback–Leibler ( 41 ), Yau–Hausdorff ( 38 ), or Structural Similarity Index Measure ( 42 ). Thus, applying these to dimensionality reduction algorithms might generate better representations of the genetic landscape.…”
Since its emergence in late 2019, SARS-CoV-2 has diversified into a large number of lineages and caused multiple waves of infection globally. Novel lineages have the potential to spread rapidly and internationally if they have higher intrinsic transmissibility and/or can evade host immune responses, as has been seen with the Alpha, Delta, and Omicron variants of concern. They can also cause increased mortality and morbidity if they have increased virulence, as was seen for Alpha and Delta. Phylogenetic methods provide the “gold standard” for representing the global diversity of SARS-CoV-2 and to identify newly emerging lineages. However, these methods are computationally expensive, struggle when datasets get too large, and require manual curation to designate new lineages. These challenges provide a motivation to develop complementary methods that can incorporate all of the genetic data available without down-sampling to extract meaningful information rapidly and with minimal curation. In this paper, we demonstrate the utility of using algorithmic approaches based on word-statistics to represent whole sequences, bringing speed, scalability, and interpretability to the construction of genetic topologies. While not serving as a substitute for current phylogenetic analyses, the proposed methods can be used as a complementary, and fully automatable, approach to identify and confirm new emerging variants.
BackgroundThe phase–amplitude coupling (PAC) opposition between distinct neural oscillations is critical to understanding brain functions. Available methods to assess phase-preference differences between conditions rely on density of occurrences. Other methods like the Kullback-Leibler Divergence (DKL) assess the distance between two conditions by transforming neurophysiological data into probabilistic distributions of phase-preference and assessing the distance between them. However, these methods have limitations such as susceptibility to noise and bias.New MethodWe propose the “Mean Opposition Vector Index” (MOVI), a parameter-free, data-driven algorithm for unbiased estimation of PAC opposition. MOVI establishes a unified framework that integrates the strength of PAC to account for reliable unimodal differences in phase-specific amplitude coupling between neurophysiological datasets.ResultsWe found that MOVI accurately detected phase opposition, was resistant to noise and gave consistent results with low or asymmetrical number of trials, therefore in conditions more similar to experimental studies.Comparison with existing methodsMOVI outperformed Jensen-Shannon Divergence (JSD), an adaptation of the DKL, in terms of sensitivity, specificity, and accuracy to detect phase opposition.ConclusionsMOVI provides a novel and useful approach to study of phase-preference opposition in neurophysiological datasets.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.