GNN学习笔记
GNN从入门到精通课程笔记
2.2 LINE (PaperReading)
- LINE: Large-scale Information Network Embedding (WWW ‘15)
- Abstract
- The method optimizes a carefully designed objective function that preserves both the local and global network structures.
- An edge-sampling algorithm is proposed that addresses the limitation of the classical stochastic gradient descent and improves both the effectiveness and the efficiency of the inference.
- (Practically, DeepWalk only applies to unweighted networks)
- Introduction
- The local structures are represented by the observed links in the networks, which capture the first-order proximity between the vertices.
- As a complement, we explore the second-order proximity between the vertices, which is not determined through the observed tie strength but through the shared neighborhood structures of the vertices.
- Nodes with shared neighbors are likely to be similar.
- The edge-sampling method, which improves both the effectiveness and efficiency of the inference.
- Problem definition
- First-order Proximity: For each pair of vertices linked by an edge (u, v), the weight on that edge, w_{uv}, indicates the first-order proximity between u and v. If no edge is observed between u and v, their first-order proximity is 0.
- The second-order proximity between a pair of vertices (u, v) in a network is the similarity between their neighborhood network structures. let p_u = (w_{u,1}, . . . , w_{u,|V |}) denote the first-order proximity of u with all the other vertices, then the second-order proximity between u and v is determined by the similarity between p_u and p_v. If no vertex is linked from/to both u and v, the second-order proximity between u and v is 0.
- Method
- Model Description
- 通过优化两个节点的之间某个特征(embedding之间的某个计算公式)和其经验概率(node之间的某个计算公式)的KL散度,使其和经验概率的距离不断接近,来优化node embedding.
- First-order proximity
- embedding特征:joint probability
- 节点的特征:边的权重占图中所有边总权重的比例
- Second-order Proximity
- embedding的特征:context, conditional distribution
- 节点的特征:边的权重/该节点的出度
- Concatenate the embeddings of first-order proximity and second-order proximity.
- Model Optimization
- Negative sampling
- Asynchronous stochastic gradient algorithm (ASGD)
- Edge Sampling: The alias table method
- Discussion
- Low-degree vertices: Add higher-order neighbors
- New vertices
- 问题:之前节点的embedding还是需要重新计算?
- Model Description