GNN学习笔记

GNN从入门到精通课程笔记

2.2 LINE (PaperReading)

LINE: Large-scale Information Network Embedding (WWW ‘15)
Abstract
- The method optimizes a carefully designed objective function that preserves both the local and global network structures.
- An edge-sampling algorithm is proposed that addresses the limitation of the classical stochastic gradient descent and improves both the effectiveness and the efficiency of the inference.
- (Practically, DeepWalk only applies to unweighted networks)
Introduction
- The local structures are represented by the observed links in the networks, which capture the first-order proximity between the vertices.
- As a complement, we explore the second-order proximity between the vertices, which is not determined through the observed tie strength but through the shared neighborhood structures of the vertices.
  - Nodes with shared neighbors are likely to be similar.
- The edge-sampling method, which improves both the effectiveness and efficiency of the inference.
Problem definition
- First-order Proximity: For each pair of vertices linked by an edge (u, v), the weight on that edge, w_{uv}, indicates the first-order proximity between u and v. If no edge is observed between u and v, their first-order proximity is 0.
- The second-order proximity between a pair of vertices (u, v) in a network is the similarity between their neighborhood network structures. let p_u = (w_{u,1}, . . . , w_{u,|V |}) denote the first-order proximity of u with all the other vertices, then the second-order proximity between u and v is determined by the similarity between p_u and p_v. If no vertex is linked from/to both u and v, the second-order proximity between u and v is 0.
Method
- Model Description
  - 通过优化两个节点的之间某个特征（embedding之间的某个计算公式）和其经验概率（node之间的某个计算公式）的KL散度，使其和经验概率的距离不断接近，来优化node embedding.
  - First-order proximity
    - embedding特征：joint probability
    - 节点的特征：边的权重占图中所有边总权重的比例
  - Second-order Proximity
    - embedding的特征：context， conditional distribution
    - 节点的特征：边的权重/该节点的出度
  - Concatenate the embeddings of first-order proximity and second-order proximity.
- Model Optimization
  - Negative sampling
  - Asynchronous stochastic gradient algorithm (ASGD)
  - Edge Sampling: The alias table method
- Discussion
  - Low-degree vertices: Add higher-order neighbors
  - New vertices
    - 问题：之前节点的embedding还是需要重新计算？