Fast and Accurate Network Embeddings via Very Sparse Random Projection

Chen, Haochen; Sultan, Syed Fahad; Tian, Yingtao; Chen, Muhao; Skiena, Steven

doi:10.1145/3357384.3357879

Cited by 55 publications

(22 citation statements)

References 30 publications

(42 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Recently, sketching-based techniques have been explored for generating node embeddings of large networks. Chen et al [36] proposed an algorithm for computing node embedding of large networks using very sparse random projections [37]. FREDE [38] generates linear space embedding of nodes using deterministic matrix sketching [39], and InstantEmbedding [40] computes node embedding using local PageRank computation.…”

Section: Related Workmentioning

confidence: 99%

QUINT: Node embedding using network hashing

Bera,

Pratap,

Verma

et al. 2021

Preprint

View full text Add to dashboard Cite

Representation learning using network embedding has received tremendous attention due to its efficacy to solve downstream tasks. Popular embedding methods (such as deepwalk, node2vec, LINE) are based on a neural architecture, thus unable to scale on large networks both in terms of time and space usage. Recently, we proposed BinSketch, a sketching technique for compressing binary vectors to binary vectors. In this paper, we show how to extend BinSketch and use it for network hashing. Our proposal named QUINT is built upon BinSketch, and it embeds nodes of a sparse network onto a low-dimensional space using simple bit-wise operations. QUINT is the first of its kind that provides tremendous gain in terms of speed and space usage without compromising much on the accuracy of the downstream tasks. Extensive experiments are conducted to compare QUINT with seven state-of-the-art network embedding methods for two end tasks -link prediction and node classification. We observe huge performance gain for QUINT in terms of speedup (up to 7000× and space saving (up to 80×) due to its bit-wise nature to obtain node embedding. Moreover, QUINT is a consistent top-performer for both the tasks among the baselines across all the datasets. Our empirical observations are backed by rigorous theoretical analysis to justify the effectiveness of QUINT. In particular, we prove that QUINT retains enough structural information which can be used further to approximate many topological properties of networks with high confidence.

show abstract

Section: Related Workmentioning

confidence: 99%

QUINT: Node embedding using network hashing

Bera,

Pratap,

Verma

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…It then obtains an updated embedding vector of 𝑠 by the following three steps: 1) It first updates the estimate vector 𝒑 𝑡 𝑠 and 𝒓 𝑡 𝑠 from Line 2 to Line 9; 2) It then calls the forward local push method to obtain the updated estimations, 𝒑 𝑡 𝑠 ; 3) We then use the hash kernel projection step to get an updated embedding. This projection step is from InstantEmbedding where two universal hash functions are defined as ℎ 𝑑 : N → [𝑑] and ℎ sgn : N → {±1} 6 . Then the hash kernel based on these two hash functions is defined as 𝐻 ℎ sgn ,ℎ 𝑑 (𝒙) : R 𝑛 → R 𝑑 where each entity 𝑖 is 𝑗 ∈ℎ −1 𝑑 (𝑖) 𝑥 𝑗 ℎ sgn ( 𝑗).…”

Section: Dynamic Graph Embedding For Single Batchmentioning

confidence: 99%

“…Then the hash kernel based on these two hash functions is defined as 𝐻 ℎ sgn ,ℎ 𝑑 (𝒙) : R 𝑛 → R 𝑑 where each entity 𝑖 is 𝑗 ∈ℎ −1 𝑑 (𝑖) 𝑥 𝑗 ℎ sgn ( 𝑗). Different from random projection used in RandNE [47] and FastRP [6], hash functions has O (1) memory cost while random projection based method has O (𝑑𝑛) if the Gaussian matrix is used. Furthermore, hash kernel keeps unbiased estimator for the inner product [42].…”

Section: Dynamic Graph Embedding For Single Batchmentioning

confidence: 99%

Subset Node Representation Learning over Large Dynamic Graphs

Guo¹,

Zhou²,

Skiena³

2021

Preprint

Self Cite

View full text Add to dashboard Cite

Dynamic graph representation learning is a task to learn node embeddings over dynamic networks, and has many important applications, including knowledge graphs, citation networks to social networks. Graphs of this type are usually large-scale but only a small subset of vertices are related in downstream tasks. Current methods are too expensive to this setting as the complexity is at best linear-dependent on both the number of nodes and edges.In this paper, we propose a new method, namely Dynamic Personalized PageRank Embedding (DynamicPPE) for learning a target subset of node representations over large-scale dynamic networks. Based on recent advances in local node embedding and a novel computation of dynamic personalized PageRank vector (PPV), Dy-namicPPE has two key ingredients: 1) the per-PPV complexity is O (𝑚 d/𝜖) where 𝑚, d, and 𝜖 are the number of edges received, average degree, global precision error respectively. Thus, the per-edge event update of a single node is only dependent on d in average; and 2) by using these high quality PPVs and hash kernels, the learned embeddings have properties of both locality and global consistency. These two make it possible to capture the evolution of graph structure effectively.Experimental results demonstrate both the effectiveness and efficiency of the proposed method over large-scale dynamic networks. We apply DynamicPPE to capture the embedding change of Chinese cities in the Wikipedia graph during this ongoing COVID-19 pandemic 1 . Our results show that these representations successfully encode the dynamics of the Wikipedia graph.

show abstract

“…There are several variants of random projection techniques for manifold and network learning [14,20]. Recently, RandNE [34] and FastRP [3] are proposed to capture high-order structure information of homogeneous network by Gaussian random projection and sparse random projection, respectively. However, these methods ignore the heterogeneity and attributes of nodes and relations, and thus cannot capture the rich semantics on AMHENs.…”

Section: Related Workmentioning

confidence: 99%

“…The product attributes include the price, sales-rank, brand, category, etc. AMiner dataset 3 contains three types of nodes: author, paper and conference. The domain of papers is considered as the class label.…”

Section: Experiments 51 Datasetsmentioning

confidence: 99%

Fast Attributed Multiplex Heterogeneous Network Embedding

Liu

Huang²,

et al. 2020

Proceedings of the 29th ACM International Conference on Information &Amp; Knowledge Management

View full text Add to dashboard Cite

In recent years, heterogeneous network representation learning has attracted considerable attentions with the consideration of multiple node types. However, most of them ignore the rich set of network attributes (attributed network) and different types of relations (multiplex network), which can hardly recognize the multimodal contextual signals across different relations. While a handful of network embedding techniques are developed for attributed multiplex heterogeneous networks, they are significantly limited to the scalability issue on large-scale network data, due to their heavy computation and memory cost. In this work, we propose a Fast Attributed Multiplex heterogeneous network Embedding framework (FAME) for large-scale network data, by mapping the units from different modalities (i.e., network topological structures, various node features and relations) into the same latent space in an efficient way. Our FAME is an integrative architecture with the scalable spectral transformation and sparse random projection, to automatically preserve both attribute semantics and multi-type relations in the learned embeddings. Extensive experiments on four real-world datasets with various network analytical tasks, demonstrate that FAME achieves both effectiveness and significant efficiency over state-of-the-art baselines. The source code is available at: https://github.com/ZhijunLiu95/FAME. CCS CONCEPTS • Mathematics of computing → Graph algorithms; • Computing methodologies → Learning latent representations.

show abstract

Fast and Accurate Network Embeddings via Very Sparse Random Projection

Cited by 55 publications

References 30 publications

QUINT: Node embedding using network hashing

QUINT: Node embedding using network hashing

Subset Node Representation Learning over Large Dynamic Graphs

Fast Attributed Multiplex Heterogeneous Network Embedding

Contact Info

Product

Resources

About