Given an input graph G and a node v ∈ G, homogeneous network embedding (HNE) maps the graph structure in the vicinity of v to a compact, fixed-dimensional feature vector. This paper focuses on HNE for massive graphs, e.g., with billions of edges. On this scale, most existing approaches fail, as they incur either prohibitively high costs, or severely compromised result utility.Our proposed solution, called Node-Reweighted PageRank (NRP), is based on a classic idea of deriving embedding vectors from pairwise personalized PageRank (PPR) values. Our contributions are twofold: first, we design a simple and efficient baseline HNE method based on PPR that is capable of handling billion-edge graphs on commodity hardware; second and more importantly, we identify an inherent drawback of vanilla PPR, and address it in our main proposal NRP. Specifically, PPR was designed for a very different purpose, i.e., ranking nodes in G based on their relative importance from a source node's perspective. In contrast, HNE aims to build node embeddings considering the whole graph. Consequently, node embeddings derived directly from PPR are of suboptimal utility.The proposed NRP approach overcomes the above deficiency through an effective and efficient node reweighting algorithm, which augments PPR values with node degree information, and iteratively adjusts embedding vectors accordingly. Overall, NRP takes O(m log n) time and O(m) space to compute all node embeddings for a graph with m edges and n nodes. Our extensive experiments that compare NRP against 18 existing solutions over 6 real graphs demonstrate that NRP achieves higher result utility than all the solutions for link prediction, graph reconstruction and node classification, while being up to orders of magnitude faster. In particular, on a billion-edge Twitter graph, NRP terminates within 4 hours, using a single CPU core.
Being involved in many important biological processes, miRNAs can regulate gene expression by targeting mRNAs to facilitate their degradation or translational inhibition. Many miRNA sequencing studies reveal that miRNA variations such as isomiRs and “arm switching” are biologically relevant. However, existing standalone tools usually do not provide comprehensive, detailed information on miRNA variations. To deepen our understanding of miRNA variability, we developed a new standalone tool called “mirPRo” to quantify known miRNAs and predict novel miRNAs. Compared with the most widely used standalone program, miRDeep2, mirPRo offers several new functions including read cataloging based on genome annotation, optional seed region check, miRNA family expression quantification, isomiR identification and categorization, and “arm switching” detection. Our comparative data analyses using three datasets from mouse, human and chicken demonstrate that mirPRo is more accurate than miRDeep2 by avoiding over-counting of sequence reads and by implementing different approaches in adapter trimming, mapping and quantification. mirPRo is an open-source standalone program (https://sourceforge.net/projects/mirpro/).
Spatial clustering deals with the unsupervised grouping of places into clusters and finds important applications in urban planning and marketing. Current spatial clustering models disregard information about the people who are related to the clustered places. In this paper, we show how the density-based clustering paradigm can be extended to apply on places which are visited by users of a geo-social network. Our model considers both spatial information and the social relationships between users who visit the clustered places. After formally defining the model and the distance measure it relies on, we present efficient algorithms for its implementation, based on spatial indexing. We evaluate the effectiveness of our model via a case study on real data; in addition, we design two quantitative measures, called social entropy and community score to evaluate the quality of the discovered clusters. The results show that geo-social clusters have special properties and cannot be found by applying simple spatial clustering approaches. The efficiency of our index-based implementation is also evaluated experimentally.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.