GNN学习笔记

GNN从入门到精通课程笔记

2.2 LINE (PaperReading)

  • LINE: Large-scale Information Network Embedding (WWW ‘15)
  • Abstract
    • The method optimizes a carefully designed objective function that preserves both the local and global network structures.
    • An edge-sampling algorithm is proposed that addresses the limitation of the classical stochastic gradient descent and improves both the effectiveness and the efficiency of the inference.
    • (Practically, DeepWalk only applies to unweighted networks)
  • Introduction
    • The local structures are represented by the observed links in the networks, which capture the first-order proximity between the vertices.
    • As a complement, we explore the second-order proximity between the vertices, which is not determined through the observed tie strength but through the shared neighborhood structures of the vertices.
      • Nodes with shared neighbors are likely to be similar.
    • The edge-sampling method, which improves both the effectiveness and efficiency of the inference.
  • Problem definition
    • First-order Proximity: For each pair of vertices linked by an edge (u, v), the weight on that edge, w_{uv}, indicates the first-order proximity between u and v. If no edge is observed between u and v, their first-order proximity is 0.
    • The second-order proximity between a pair of vertices (u, v) in a network is the similarity between their neighborhood network structures. let p_u = (w_{u,1}, . . . , w_{u,|V |}) denote the first-order proximity of u with all the other vertices, then the second-order proximity between u and v is determined by the similarity between p_u and p_v. If no vertex is linked from/to both u and v, the second-order proximity between u and v is 0.
  • Method
    • Model Description
      • 通过优化两个节点的之间某个特征(embedding之间的某个计算公式)和其经验概率(node之间的某个计算公式)的KL散度,使其和经验概率的距离不断接近,来优化node embedding.
      • First-order proximity
        first-order
        • embedding特征:joint probability
        • 节点的特征:边的权重占图中所有边总权重的比例
      • Second-order Proximity
        second-order
        • embedding的特征:context, conditional distribution
        • 节点的特征:边的权重/该节点的出度
      • Concatenate the embeddings of first-order proximity and second-order proximity.
    • Model Optimization
      • Negative sampling
      • Asynchronous stochastic gradient algorithm (ASGD)
      • Edge Sampling: The alias table method
    • Discussion
      • Low-degree vertices: Add higher-order neighbors
      • New vertices
        • 问题:之前节点的embedding还是需要重新计算?