Can Lu scite author profile

Reachability query is a fundamental graph operation which answers whether a vertex can reach another vertex over a large directed graph G with n vertices and m edges, and has been extensively studied. In the literature, all the approaches compute a label for every vertex in a graph G by index construction offline. The query time for answering reachability queries online is affected by the quality of the labels computed in index construction. The three main costs are the index construction time, the index size, and the query time. Some of the up-to-date approaches can answer reachability queries efficiently, but spend non-linear time to construct an index. Some of the up-to-date approaches construct an index in linear time and space, but may need to depth-first search G at run-time in O(n + m). In this paper, as the first, we propose a new randomized labeling approach to answer reachability queries, and the randomness is by independent permutation. We conduct extensive experimental studies to compare with the up-to-date approaches using 19 large real datasets used in the existing work and synthetic datasets. We confirm the efficiency of our approach.

show abstract

Finding the maximum clique in massive graphs

Wei

et al. 2017

Proc. VLDB Endow.

View full text Add to dashboard Cite

Cliques refer to subgraphs in an undirected graph such that vertices in each subgraph are pairwise adjacent. The maximum clique problem, to find the clique with most vertices in a given graph, has been extensively studied. Besides its theoretical value as an NP-hard problem, the maximum clique problem is known to have direct applications in various fields, such as community search in social networks and social media, team formation in expert networks, gene expression and motif discovery in bioinformatics and anomaly detection in complex networks, revealing the structure and function of networks. However, algorithms designed for the maximum clique problem are expensive to deal with real-world networks. In this paper, we devise a randomized algorithm for the maximum clique problem. Different from previous algorithms that search from each vertex one after another, our approach RMC , for the randomized maximum clique problem, employs a binary search while maintaining a lower bound <u>ω c </u> and an upper bound [EQUATION] of ω ( G ). In each iteration, RMC attempts to find a ω t -clique where [EQUATION]. As finding ω t in each iteration is NP-complete, we extract a seed set S such that the problem of finding a ω t -clique in G is equivalent to finding a ω t -clique in S with probability guarantees (≥1− n −c ). We propose a novel iterative algorithm to determine the maximum clique by searching a k -clique in S starting from k = <u>ω c </u> +1 until S becomes [EQUATION], when more iterations benefit marginally. As confirmed by the experiments, our approach is much more efficient and robust than previous solutions and can always find the exact maximum clique.

show abstract

Exploring Hierarchies in Online Social Networks

et al. 2016

IEEE Trans. Knowl. Data Eng.

View full text Add to dashboard Cite

Social hierarchy (i.e., pyramid structure of societies) is a fundamental concept in sociology and social network analysis. The importance of social hierarchy in a social network is that the topological structure of the social hierarchy is essential in both shaping the nature of social interactions between individuals and unfolding the structure of the social networks. The social hierarchy found in a social network can be utilized to improve the accuracy of link prediction, provide better query results, rank web pages, and study information flow and spread in complex networks. In this paper, we model a social network as a directed graph G, and consider the social hierarchy as DAG (directed acyclic graph) of G, denoted as GD. By DAG, all the vertices in G can be partitioned into different levels, the vertices at the same level represent a disjoint group in the social hierarchy, and all the edges in DAG follow one direction. The main issue we study in this paper is how to find DAG GD in G. The approach we take is to find GD by removing all possible cycles from G such that G = U(G) ∪ GD where U(G) is a maximum Eulerian subgraph which contains all possible cycles. We give the reasons for doing so, investigate the properties of GD found, and discuss the applications. In addition, we develop a novel two-phase algorithm, called Greedy-&-Refine, which greedily computes an Eulerian subgraph and then refines this greedy solution to find the maximum Eulerian subgraph. We give a bound between the greedy solution and the optimal. The quality of our greedy approach is high. We conduct comprehensive experimental studies over 14 real-world datasets. The results show that our algorithms are at least two orders of magnitude faster than the baseline algorithm.

show abstract

String Similarity Search: A Hash-Based Approach

Wei

2018

IEEE Trans. Knowl. Data Eng.

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Can Lu

Speedup Graph Processing by Graph Ordering

Reachability querying: an independent permutation labeling approach

Finding the maximum clique in massive graphs

Exploring Hierarchies in Online Social Networks

String Similarity Search: A Hash-Based Approach

Contact Info

Product

Resources

About