Abnormalities in the topology of brain networks may be an important feature and etiological factor for psychogenic non-epileptic seizures (PNES). To explore this possibility, we applied a graph theoretical approach to functional networks based on resting state EEGs from 13 PNES patients and 13 age- and gender-matched controls. The networks were extracted from Laplacian-transformed time-series by a cross-correlation method. PNES patients showed close to normal local and global connectivity and small-world structure, estimated with clustering coefficient, modularity, global efficiency, and small-worldness (SW) metrics, respectively. Yet the number of PNES attacks per month correlated with a weakness of local connectedness and a skewed balance between local and global connectedness quantified with SW, all in EEG alpha band. In beta band, patients demonstrated above-normal resiliency, measured with assortativity coefficient, which also correlated with the frequency of PNES attacks. This interictal EEG phenotype may help improve differentiation between PNES and epilepsy. The results also suggest that local connectivity could be a target for therapeutic interventions in PNES. Selective modulation (strengthening) of local connectivity might improve the skewed balance between local and global connectivity and so prevent PNES events.
Functional connectivity in human brain can be represented as a network using electroencephalography (EEG) signals. These networks – whose nodes can vary from tens to hundreds – are characterized by neurobiologically meaningful graph theory metrics. This study investigates the degree to which various graph metrics depend upon the network size. To this end, EEGs from 32 normal subjects were recorded and functional networks of three different sizes were extracted. A state-space based method was used to calculate cross-correlation matrices between different brain regions. These correlation matrices were used to construct binary adjacency connectomes, which were assessed with regards to a number of graph metrics such as clustering coefficient, modularity, efficiency, economic efficiency, and assortativity. We showed that the estimates of these metrics significantly differ depending on the network size. Larger networks had higher efficiency, higher assortativity and lower modularity compared to those with smaller size and the same density. These findings indicate that the network size should be considered in any comparison of networks across studies.
Binary relation matrix simulationTo benchmark our compression techniques systematically, we generated three series of random binary matrices satisfying different properties. Given fixed matrix dimensions n × m, an expected column density d, and a uniqueness factor u, we define our generation schemes as follows:1. Random: generate m random columns of length n with expected density d 2. Uniform rows: generate m random columns of length n u , then duplicate each row u times 3. Uniform columns: generate m u columns of length n, then duplicate each column u times For each generated column, its indices are iterated through linearly and the values of the indices are set by drawing observations from a random variable X ∼ Bernoulli(d). For all experiments, values of n = 1, 000, 000, u = 5, and d = 0.01 were used. The values m ∈ {500, 1000, 3000} were used.
Nonlinear dimensionality reduction methods have demonstrated top-notch performance in many pattern recognition and image classification tasks. Despite their popularity, they suffer from highly expensive time and memory requirements, which render them inapplicable to large-scale datasets. To leverage such cases we propose a new method called "Path-Based Isomap". Similar to Isomap, we exploit geodesic paths to find the low-dimensional embedding. However, instead of preserving pairwise geodesic distances, the low-dimensional embedding is computed via a path-mapping algorithm. Due to the much fewer number of paths compared to number of data points, a significant improvement in time and memory complexity with a comparable performance is achieved. The method demonstrates state-of-the-art performance on well-known synthetic and real-world datasets, as well as in the presence of noise.
High-throughput DNA sequencing data is accumulating in public repositories, and efficient approaches for storing and indexing such data are in high demand. In recent research, several graph data structures have been proposed to represent large sets of sequencing data and to allow for efficient querying of sequences. In particular, the concept of labeled de Bruijn graphs has been explored by several groups. While there has been good progress towards representing the sequence graph in small space, methods for storing a set of labels on top of such graphs are still not sufficiently explored. It is also currently not clear how characteristics of the input data, such as the sparsity and correlations of labels, can help to inform the choice of method to compress the graph labeling. In this work, we present a new compression approach, Multi-BRWT, which is adaptive to different kinds of input data. We show an up to 29% improvement in compression performance over the basic BRWT method, and up to a 68% improvement over the current state-of-the-art for de Bruijn graph label compression. To put our results into perspective, we present a systematic analysis of five different state-of-the-art annotation compression schemes, evaluate key metrics on both artificial and real-world data and discuss how different data characteristics influence the compression performance. We show that the improvements of our new method can be robustly reproduced for different representative real-world datasets.
The sharp increase in next-generation sequencing technologies' capacity has created a demand for algorithms capable of quickly searching a large corpus of biological sequences. The complexity of biological variability and the magnitude of existing data sets have impeded finding algorithms with guaranteed accuracy that efficiently run in practice. Our main contribution is the Tensor Sketch method that efficiently and accurately estimates edit distances. In our experiments, Tensor Sketch had 0.88 Spearman's rank correlation with the exact edit distance, almost doubling the 0.466 correlation of the closest competitor while running 8.8 times faster. Finally, all sketches can be updated dynamically if the input is a sequence stream, making it appealing for large-scale applications where data cannot fit into memory. Conceptually, our approach has three steps: 1) represent sequences as tensors over their sub-sequences, 2) apply tensor sketching that preserves tensor inner products, 3) implicitly compute the sketch. The sub-sequences, which are not necessarily contiguous pieces of the sequence, allow us to outperform k-mer-based methods, such as min-hash sketching over a set of k-mers. Typically, the number of sub-sequences grows exponentially with the sub-sequence length, introducing both memory and time overheads. We directly address this problem in steps 2 and 3 of our method. While the sketching of rank-1 or super-symmetric tensors is known to admit efficient sketching, the sub-sequence tensor does not satisfy either of these properties. Hence, we propose a new sketching scheme that completely avoids the need for constructing the ambient space. Our tensor-sketching technique's main advantages are three-fold: 1) Tensor Sketch has higher accuracy than any of the other assessed sketching methods used in practice. 2) All sketches can be computed in a streaming fashion, leading to significant time and memory savings when there is overlap between input sequences. 3) It is straightforward to extend tensor sketching to different settings leading to efficient methods for related sequence analysis tasks. We view tensor sketching as a framework to tackle a wide range of relevant bioinformatics problems, and we are confident that it can bring significant improvements for applications based on edit distance estimation.
Sequence-to-graph alignment is an important step in applications such as variant genotyping, read error correction and genome assembly. When a query sequence requires a substantial number of edits to align, approximate alignment tools that follow the seed-and-extend approach require shorter seeds to get any matches. However, in large graphs with high variation, relying on a shorter seed length leads to an exponential increase in spurious matches. We propose a novel seeding approach relying on long inexact matches instead of short exact matches. We demonstrate experimentally that our approach achieves a better time-accuracy trade-off in settings with up to a 25% mutation rate. We achieve this by sketching a subset of graph nodes and storing them in a K-nearest neighbor index. While sketches are more robust to indels, finding the nearest neighbor of a sketch in a high-dimensional space is more computationally challenging than finding exact seeds. We demonstrate that if we store sketch vectors in a K-nearest neighbor index, we can circumvent the curse of dimensionality. Our long sketch-based seed scheme contrasts existing approaches and highlights the important role that tensor sketching can play in bioinformatics applications. Our proposed seeding method and implementation have several advantages: i) We empirically show that our method is efficient and scales to graphs with 1 billion nodes, with time and memory requirements for preprocessing growing linearly with graph size and query time growing quasi-logarithmically with query length. ii) For queries with an edit distance of 25% relative to their length, on the 1 billion node graph, longer sketch-based seeds yield a 4x increase in recall compared to exact seeds. iii) Conceptually, our seeder can be incorporated into other aligners, proposing a novel direction for sequence-to-graph alignment. The implementation is available at: https://github.com/ratschlab/tensor-sketch-alignment.
Sequence-to-graph alignment is crucial for applications such as variant genotyping, read error correction, and genome assembly. We propose a novel seeding approach that relies on long inexact matches rather than short exact matches, and demonstrate that it yields a better time-accuracy trade-off in settings with up to a 25% mutation rate. We use sketches of a subset of graph nodes, which are more robust to indels, and store them in ak-nearest neighbor index to avoid the curse of dimensionality. Our approach contrasts with existing methods and highlights the important role that sketching into vector space can play in bioinformatics applications. We show that our method scales to graphs with 1 billion nodes and has quasi-logarithmic query time for queries with an edit distance of 25%. For such queries, longer sketch-based seeds yield a 4× increase in recall compared to exact seeds. Our approach can be incorporated into other aligners, providing a novel direction for sequence-to-graph alignment.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.