IT数码 购物 网址 头条 软件 日历 阅读 图书馆
TxT小说阅读器
↓语音阅读,小说下载,古典文学↓
图片批量下载器
↓批量下载图片,美女图库↓
图片自动播放器
↓图片自动播放器↓
一键清除垃圾
↓轻轻一点,清除系统垃圾↓
开发: C++知识库 Java知识库 JavaScript Python PHP知识库 人工智能 区块链 大数据 移动开发 嵌入式 开发工具 数据结构与算法 开发测试 游戏开发 网络协议 系统运维
教程: HTML教程 CSS教程 JavaScript教程 Go语言教程 JQuery教程 VUE教程 VUE3教程 Bootstrap教程 SQL数据库教程 C语言教程 C++教程 Java教程 Python教程 Python3教程 C#教程
数码: 电脑 笔记本 显卡 显示器 固态硬盘 硬盘 耳机 手机 iphone vivo oppo 小米 华为 单反 装机 图拉丁
 
   -> 人工智能 -> CS224W: Machine Learning with Graphs - 03 Node Embeddings -> 正文阅读

[人工智能]CS224W: Machine Learning with Graphs - 03 Node Embeddings

Node Embeddings

1. Graph Represnetation Learning

Graph represnetation learning alleviates the need to do feature engineering every single time (automatically learn the features)
Goal: efficient task-independent feature learning for machine learning with graphs
Why embedding?

  • Similarity of embeddings between nodes indicates their similarity in the netwrok
  • Encode network information
  • Potantially used for many downstream predictions (node classification, link prediction, graph prediction, anomalous node detection, clustering…)

2. Node Embeddings: Encoder and Decoder

Goal: encode nodes so that similarity in the embedding space approximates similarity in the graph
a) Encoder ENC maps from nodes to embeddings (a low-dimensional vector)
b) Define a node similarity function (i.e., a measure of similarity in the original network)
c) Decoder DEC maps from embeddings to the similarity score
d) Optimize the parameters of the encoder so that similarity ( u , v ) ≈ z v T z u (u, v)\approx z_v^Tz_u (u,v)zvT?zu?

1). “Shallow” Encoding

Simplest encoding approach: encoder is just an embedding-lookup so each node is assigned a unique embedding vector
ENV ( v ) = z v = Z ? v \text{ENV}(v)=z_v=Z \cdot v ENV(v)=zv?=Z?v
where Z Z Z is matrix and each column is a node embedding and v v v is an indicator vector with all zeroes excepy a one in column indicating node v v v
Methods: DeepWalk, node2vec

3. Random Walk Approaches for Node Embeddings

  • Vector z u z_u zu? is the embedding of node u u u
  • Probability P ( v ∣ z u ) P(v|z_u) P(vzu?) is the (predicted) probability of visiting node v v v on random walks starting from node u u u
  • Random walk: given a graph and a starting point, we select one of its neighbors at random and move to this neighbor; then we select a neighbor of this point at random and move to it, etc. The (random) sequence of points visited this way is a random walk on the graph.
a. Random-walk Embeddings

z u T z v ≈ probability?that? u ?and? v ?co-occur?on?a?random?walk?over?the?graph z_u^Tz_v \approx \text{probability that \textit{u} and \textit{v} co-occur on a random walk over the graph} zuT?zv?probability?that?u?and?v?co-occur?on?a?random?walk?over?the?graph

  • Estimate probability P R ( v ∣ u ) P_R(v|u) PR?(vu) of visiting node v v v on a random walk starting from node u u u using the random walk strategy R R R
  • Optimize embeddings to encode these random walk statistics

Why random walks?

  • Expressivity: flexible stochastic definition of node similarity that incorporates both local and higher-order neighborhood information (If a random walk starting from node u u u visits v v v with high probability, u u u and v v v are similar)
  • Efficiency: do not need to consider all node pairs when training; only need to consider pairs that co-occur on random walks
b. Unsupervised Feature Learning

Intuition: find embedding of nodes in d d d-dimensional space that preserves similarity
Idea: learn node embedding such that nearby nodes are close together in the network
N R ( u ) N_R(u) NR?(u): neighborhood of u u u obtained by the strategy R R R
Goal: learn a mapping f : u → R d f: u \to \mathbb{R}_d f:uRd?: f ( u ) = z u f(u) = z_u f(u)=zu?
Log-likelihood objective:
m a x f ∑ u ∈ V l o g P ( N R ( u ) ∣ z u ) \underset{f}{max}\sum_{u\in V} log P(N_R(u)|z_u) fmax?uV?logP(NR?(u)zu?)
Given node u u u, we want to learn feature represnetations that are predictive of the nodes in its random walk neighborhood N R ( u ) N_R(u) NR?(u)

c. Random-walk Optimization

a) Run short fixed-length random walks starting from each node u u u in the graph using the random walk strategy R R R
b) For each node u u u, collect N R ( u ) N_R(u) NR?(u), the multiset of nodes visited on random walks starting from u u u
c) Optimize embeddings
m a x f ∑ u ∈ V log ? P ( N R ( u ) ∣ z u ) \underset{f}{max}\sum_{u\in V} \log P(N_R(u)|z_u) fmax?uV?logP(NR?(u)zu?)
Parameterize P ( v ∣ z u ) P(v|z_u) P(vzu?) using softmax
P ( v ∣ z u ) = exp ? ( z u T z v ) ∑ n ∈ V exp ? ( z u T z n ) P(v|z_u)=\frac{\exp(z_u^Tz_v)}{\sum_{n\in V}\exp(z_u^Tz_n)} P(vzu?)=nV?exp(zuT?zn?)exp(zuT?zv?)?
Let
L = ∑ u ∈ V ∑ v ∈ N R ( u ) ? log ? P ( v ∣ z u ) = ∑ u ∈ V ∑ v ∈ N R ( u ) ? log ? ( exp ? ( z u T z v ) ∑ n ∈ V exp ? ( z u T z n ) ) L=\sum_{u\in V}\sum_{v\in N_R(u)}-\log P(v|z_u)=\sum_{u\in V}\sum_{v\in N_R(u)}-\log(\frac{\exp(z_u^Tz_v)}{\sum_{n\in V}\exp(z_u^Tz_n)}) L=uV?vNR?(u)??logP(vzu?)=uV?vNR?(u)??log(nV?exp(zuT?zn?)exp(zuT?zv?)?)
Optimizing random walk embeddings = Finding embeddings z u z_u zu? that minimizes L L L
Time complexity: O ( ∣ V 2 ∣ ) O(|V^2|) O(V2)
Solution: Negative sampling
log ? ( exp ? ( z u T z v ) ∑ n ∈ V exp ? ( z u T z n ) ) ≈ log ? ( σ ( z u T z v ) ) ? ∑ i = 1 k log ? ( z u T z n i ) , n i ~ P V \log(\frac{\exp(z_u^Tz_v)}{\sum_{n\in V}\exp(z_u^Tz_n)}) \approx \log (\sigma(z_u^Tz_v)) - \sum_{i=1}^k \log(z_u^Tz_{n_i}), n_i\sim P_V log(nV?exp(zuT?zn?)exp(zuT?zv?)?)log(σ(zuT?zv?))?i=1k?log(zuT?zni??),ni?PV?
where σ ( ? ) \sigma(\cdot) σ(?) is sigmoid function and n i n_i ni? follows random distribution over nodes
Instead of normalizing w.r.t. all nodes, just normalize against k k k random “negative samples” n i n_i ni?
Sample k k k negative nodes each with probability proportional to its degree ( k = 5 ~ 20 k=5\sim 20 k=520 in practice)
To minimize L L L, we can use gradient descent (GD) or stochastic gradient descent (SGD)
How to randomly walk
Simplest idea: just run fixed-length, unbiased random walks starting from each node

d. Overview of Node2vec
  • Goal: embed nodes with similar network neighborhoods close in the feature space
  • Key observation: flexible notion of network neighborhood N R ( u ) N_R(u) NR?(u) of node u u u leads to rich node embeddings
  • Develop biased 2 n d 2^{nd} 2nd order random walk R R R to trade off between local (BFS) and global (DFS) views of the network

Interpolating BFS and DFS
Two parameters:

  • Return parameter p p p: return back to the previous node
  • In-out parameter q q q (ratio of BFS vs DFS): moving outwards (DFS) or inwards (BFS)

Biased Random Walks
Idea:remember where the walk came from

  • BFS-like walk: low value of p p p
  • DFS-like walk: low value of q q q

Node2vec Algorithm
a) compute random walk probabilities
b) simulate r r r random walks of length l l l start from each node u u u
c) optimize the node2vec objective using SGD
Time Complexity: linear
Steps are individually parallelizable

e. Other Random Walk Ideas
  • Different kinds of biased random walks: based on node attributes/learned weights
  • Alternative optimization schemes: directly optimize based on 1-hop and 2-hop random walk probabilities
  • Network preprocessing techniques: run random walks on modified versions of the original network

4. Embedding Entire Graphs

To be updated

1). Node-level features: Overview

Goal: characterize the structure and position of a node in the network

  • Node degree
  • Node centrality
  • Clustering coefficient
  • Graphlets
2). Node degree
  • The degree k v k_v kv? of node v v v is the number of edges (neighboring nodes) the node has
  • Treats all neighboring nodes equally
3). Node centrality
  • Node degree counts the neighboring nodes without capturing their importance.
  • Node centrality c v c_v cv? takes the node importance in a graph into account
  • Different ways to model importance: eigenvector centrality, betweenness centrality, closeness centrality and so on.
a. Eigenvector centrality
  • A node v v v is important if surrounded by important neighboring nodes u ∈ N ( v ) u \in N(v) uN(v).
  • c v = 1 λ ∑ u ∈ N ( v ) c u c_v=\frac{1}{\lambda} \sum_{u \in N(v)} c_u cv?=λ1?uN(v)?cu?, λ \lambda λ is some positive constant.
  • λ c = A c \lambda c = Ac λc=Ac, A A A is the adjacency matrix A u v = 1 A_{uv}=1 Auv?=1 if u ∈ N ( v ) u \in N(v) uN(v) and c c c is centrality vector
  • The largest eigenvector λ m a x \lambda_{max} λmax? is always positive and unique
  • The leading eigenvector c m a x c_{max} cmax? is used for centrality
b. Betweenness centrality
  • A node is important if it lies on many shortest paths between other nodes
    c v = ∑ s ≠ v ≠ t # ( shortest?paths?between? s ?and? t ?that?contain? v ) # ( shortest?paths?between? s ?and? t ) c_v = \sum _{s \neq v \neq t} \frac{\#(\text{shortest paths between \textit{s} and \textit{t} that contain \textit{v}})}{\#(\text{shortest paths between \textit{s} and \textit{t}})} cv?=s?=v?=t?#(shortest?paths?between?s?and?t)#(shortest?paths?between?s?and?t?that?contain?v)?
c. Closeness centrality
  • A node is important if it has small shortest path lengths to all other nodes
    c v = ∑ u ≠ v 1 # ( shortest?paths?length?between? u ?and? v ) c_v = \sum _{u \neq v} \frac{1}{\#(\text{shortest paths length between \textit{u} and \textit{v}})} cv?=u?=v?#(shortest?paths?length?between?u?and?v)1?
4). Node Features: Clustering Coefficient
  • Measures how connected the neighboring nodes of v v v are
    e v = ∑ u ≠ v # ( edges?among?neighboring?nodes ) ( 2 k v ) ∈ [ 0 , 1 ] e_v = \sum _{u \neq v} \frac{\#(\text{edges among neighboring nodes})}{(_2^{k_v})} \in [0, 1] ev?=u?=v?(2kv??)#(edges?among?neighboring?nodes)?[0,1]
    where ( 2 k v ) (_2^{k_v}) (2kv??) is the number of node pairs among k v k_v kv? neighboring nodes
5). Node Features: Graphlets

Observation: clustering coefficient counts the #(triangles) in the ego-network
Idea: we can generalize the above by counting #(pre-specified subgraphs, i.e., graphlets)

  • Graphlets: rooted connected non-isomorphic subgraphs.
  • Graphlet Degree Vector (GDV): Graphlet-based features for nodes

Compare with node degree and clustering coefficient
Degree counts #(edges) that a node touches
Clustering coefficient counts #(triangles) that a node touches
GDV counts #(graphlets) that a node touches

  • Graphlet Degree Vector (GDV): a count vector of graphlets rooted at a given node.
  • Considering graphlets on 2 to 5 nodes we get vector of 73 coordinates is a signature of a node that describes the topology of its neighborhood and captures its interconnectivities out to a distance of 4 hops
  • GDV provides a more detailed measure of a node’s local network topology than node degrees or clustering coefficient

4. Link-level Tasks

The task is to predict new links based on existing links.
When testing, all node pairs (no existing links) are ranked, and top K K K node pairs are predicted.
The key is to design features for a pair of nodes
Tow formulations

  • Links missing at random: remove a random set of links and then aim to predict them
  • Links over time: given G [ t 0 , t 0 ′ ] G[t_0, t_0^{'}] G[t0?,t0?], a graph on edges up to time t 0 ′ t_0^{'} t0?, output a ranked list L L L of links (not in G [ t 0 , t 0 ′ ] G[t_0, t_0^{'}] G[t0?,t0?]) that are predicted to appear in G [ t 1 , t 1 ′ ] G[t_1, t_1^{'}] G[t1?,t1?]

Methodology: compute score c ( x , y ) c(x, y) c(x,y) for each pair of nodes ( x , y ) (x, y) (x,y), sort pairs ( x , y ) (x, y) (x,y) by the decreasing score c ( x , y ) c(x, y) c(x,y), predict top n n n pairs as new links, see which of these links actually appear in G [ t 1 , t 1 ′ ] G[t_1, t_1^{'}] G[t1?,t1?]

1). Link-level Features: Overview
  • Distance-based feature
  • Local neighborhood overlap
  • Global neighborhood overlap
2). Distance-based feature

Shortest-path distance between two nodes. However, this does not capture the degree of neighborhood overlap.

3). Local neighborhood overlap

Captures the number of neighboring nodes shared between two nodes v 1 v_1 v1? and v 2 v_2 v2?

  • Common neighbors: ∣ N ( v 1 ) ∩ N ( v 1 ) ∣ |N(v_1)\cap N(v_1)| N(v1?)N(v1?)
  • Jaccard’s coefficient: ∣ N ( v 1 ) ∩ N ( v 1 ) ∣ ∣ N ( v 1 ) ∪ N ( v 1 ) ∣ \frac{|N(v_1)\cap N(v_1)|}{|N(v_1)\cup N(v_1)|} N(v1?)N(v1?)N(v1?)N(v1?)?
  • Adamic-Adar index: ∑ u ∈ N ( v 1 ) ∩ N ( v 1 ) 1 l o g ( k u ) \sum_{u \in N(v_1)\cap N(v_1)} \frac{1}{log(k_u)} uN(v1?)N(v1?)?log(ku?)1?
    Limitation: metric is always zero if the two nodes do not have any neighbors in common. However, the two nodes may still potentially be connected in the future.
4). Global neighborhood overlap
  • Power of adjacency matrix: let P u v ( K ) = P_{uv}^{(K)} = Puv(K)?= #(paths of lenght K K K between u u u and v v v), then P ( K ) = A K P^{(K)}=A^K P(K)=AK
  • Katz index: count the number of paths of all lengths between a given pair of nodes
    S u v = ∑ l = 1 ∞ β l A u v l S_{uv}=\sum_{l=1}^{\infty}\beta^l A_{uv}^l Suv?=l=1?βlAuvl?
    S = ∑ l = 1 ∞ β l A l = ( I ? β A ) ? 1 ? I S=\sum_{l=1}^{\infty}\beta^l A^l=(I-\beta A)^{-1}-I S=l=1?βlAl=(I?βA)?1?I

4. Graph-level Features

1). Kernel Methods

Idea: design kernels instead of feature vectors

  • Kernel k ( G , G ′ ) ∈ R k(G, G^{'}) \in \mathtt {R} k(G,G)R measures similarity between data
  • Kernel matrix K = ( k ( G , G ′ ) ) G , G ′ \mathtt {K}=(k(G, G^{'}))_{G,G^{'}} K=(k(G,G))G,G? must always be positive semidefinite (i.e., has positive eigenvalues)
  • There exists a feature representation ? \phi ? such that k ( G , G ′ ) = ? ( G ) T ? ( G ′ ) k(G, G^{'}) = \phi (G)^T\phi(G^{'}) k(G,G)=?(G)T?(G)
  • Once the kernel is defined, off-the-shelf ML model, such as kernel SVM, can be used to make predictions.

Graph kernels
Goal: design graph feature vector ? ( G ) \phi (G) ?(G)
Key idea: Bag-of-Words (BoW)* for a graph (regard nodes as words)
*: BoW simply uses the word counts as features for documents

  • Graphlet kernel
  • Weisfeiler-Lehman kernel
  • rando-walk kernel, shortest-path graph kernel and many more
2). Graphlet Features

Key idea: count the number of graphlets in a graph
Note: Definition of graphlets here is different from node-level features.

  • Nodes in graphlets here do not need to be connected
  • The graphlets here are not rooted

Given a graph G G G and a graphlet list G k = ( g 1 , g 2 , . . . , g n ) G_k=(g_1, g_2,...,g_n) Gk?=(g1?,g2?,...,gn?), define the graphlet count vector f G f_G fG? as
( f G ) i = # ( g i ? G ) for? i =1,2,..., n (f_G)_i= \#(g_i \subseteq G) \text{for \textit{i}=1,2,...,\textit{n}} (fG?)i?=#(gi??G)for?i=1,2,...,n
Given two graphs G G G and G ′ G^{'} G, graphlet kernel is
k ( G , G ′ ) = h G T h G ′ k(G, G^{'}) = h_G^T h_G^{'} k(G,G)=hGT?hG?
where h G = f G s u m ( f G ) h_G=\frac{f_G}{sum(f_G)} hG?=sum(fG?)fG??
Limitations: counting graphlets is expensive.

  • Counting size- k k k graphlets for a graph with size n n n by enumeration takes n k n^k nk
  • This is unavoidable in the worst-case since subgraph isomorphism test is NP-hard
  • If a graph’s node degree is bounded by d d d, an O ( n d k ? 1 ) O(nd^{k-1}) O(ndk?1) algorithm exists to count all the graphlets of size k k k
3). Weisfeiler-Lehman Kernel

Goal: design an efficient graph feature descriptor ? ( G ) \phi (G) ?(G)
Idea: use neighborhood structure to iteratively enrice node vocabulary
Color refinement: given a graph G G G with a set of nodes V V V, assign an initial color c 0 ( v ) c^0(v) c0(v) to each ndoe v v v. Then iteratively refine node colors by
c k + 1 ( v ) = HASH ( { c k ( v ) , c k ( u ) u ∈ N ( v ) } ) c^{k+1}(v)=\text{HASH}(\{c^k(v), {c^k(u)}_{u\in N(v)}\}) ck+1(v)=HASH({ck(v),ck(u)uN(v)?})
where HASH maps different inputs to different colors. After K K K steps of color refinement, c k ( v ) c^k(v) ck(v) summarizes the structure of K K K-hop neighborhood
WL is computationally efficient
The time complexity for color reinement at each step is linear in #(degs), since it involves aggregating neighboring colors.

  人工智能 最新文章
2022吴恩达机器学习课程——第二课(神经网
第十五章 规则学习
FixMatch: Simplifying Semi-Supervised Le
数据挖掘Java——Kmeans算法的实现
大脑皮层的分割方法
【翻译】GPT-3是如何工作的
论文笔记:TEACHTEXT: CrossModal Generaliz
python从零学(六)
详解Python 3.x 导入(import)
【答读者问27】backtrader不支持最新版本的
上一篇文章      下一篇文章      查看所有文章
加:2021-10-22 10:56:24  更:2021-10-22 10:57:26 
 
开发: C++知识库 Java知识库 JavaScript Python PHP知识库 人工智能 区块链 大数据 移动开发 嵌入式 开发工具 数据结构与算法 开发测试 游戏开发 网络协议 系统运维
教程: HTML教程 CSS教程 JavaScript教程 Go语言教程 JQuery教程 VUE教程 VUE3教程 Bootstrap教程 SQL数据库教程 C语言教程 C++教程 Java教程 Python教程 Python3教程 C#教程
数码: 电脑 笔记本 显卡 显示器 固态硬盘 硬盘 耳机 手机 iphone vivo oppo 小米 华为 单反 装机 图拉丁

360图书馆 购物 三丰科技 阅读网 日历 万年历 2024年11日历 -2024/11/27 8:24:11-

图片自动播放器
↓图片自动播放器↓
TxT小说阅读器
↓语音阅读,小说下载,古典文学↓
一键清除垃圾
↓轻轻一点,清除系统垃圾↓
图片批量下载器
↓批量下载图片,美女图库↓
  网站联系: qq:121756557 email:121756557@qq.com  IT数码