Training Strategies and Augmentation Techniques for GNNs

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/23

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

24 Terms

1
New cards

Reasons and approaches of graph manipulation

  1. Feature level:

  • The input graph lacks features → feature augmentation

  • Certain structures are hard to learn by a GNN (e.g. cycle count) → we add them as features

  1. Structure level:

  • The graph is too sparse = inefficient message passing
    → adding virtual nodes/edges

  • The graph is too dense = message passing is too costly
    → random neighbor sampling

  • The graph is too large = cannot fit the computational graph into a GPU
    → subgraph sampling for embedding computing

  1. It is unlikely that the input graph happens to be the optimal computation graph for embeddings

2
New cards

Feature augmentation approaches

  • Assigning constant values to nodes

  • Assigning unique IDs to nodes as one-hot vectors

  • Adding certain features, e.g. centrality, clustering coefficient

3
New cards

Comparison of feature augmentation approaches

knowt flashcard image
4
New cards

Common approach for adding virtual edges

Connecting 2-hop neighbors via virtual edges
→ using A+A2 as computation graph

Use case: bipartite graphs (e.g. author-to-papers graph → 2-hop is author-to-author collaboration)

5
New cards

Method and benefit of adding virtual nodes

  • The virtual node will connect to all the nodes in the graph → all nodes will have a maximum distance of 2

  • Benefit: greatly improved message passing

6
New cards

Node-level prediction head

knowt flashcard image
7
New cards

Edge-level prediction head

Make predictions using pairs of node embeddings.

Options:

  • Concatenation + Linear layer: map 2d-dimensional embeddings to k-dimensional (k-way prediction)

  • Dot product: simple dot product for one-way prediction, trainable W matrices for k-way prediction

<p>Make predictions using pairs of node embeddings.</p><p>Options:</p><ul><li><p>Concatenation + Linear layer: map 2d-dimensional embeddings to k-dimensional (k-way prediction)</p></li><li><p>Dot product: simple dot product for one-way prediction, trainable W matrices for k-way prediction</p></li></ul><p></p>
8
New cards

Graph-level prediction head

Make prediction using all the node embeddings in the graph

<p>Make prediction using all the node embeddings in the graph</p>
9
New cards

Unsupervised prediction task examples

  • Node-level: node statistics, e.g. clustering coefficient

  • Edge-level: link prediction

  • Graph-level: deciding isomorphism of two graphs

10
New cards

Classification loss

knowt flashcard image
11
New cards

Regression loss

knowt flashcard image
12
New cards

Fixed vs. random dataset split

  • Fixed: we will split our dataset once

  • Random: we randomly split, report average performance over different random seeds

13
New cards

Challenge of graph splitting

The nodes (data points) are not independent, they affect each other’s prediction

14
New cards

Transductive vs inductive splitting

knowt flashcard image
15
New cards

Transductive link prediction

knowt flashcard image
16
New cards

Inductive link splitting

knowt flashcard image
17
New cards

Computational graph and its effect on node embeddings

Computational graphs are defined by the nodes’ neighborhood = rooted subtree structures around each node
→ if two nodes features and computational graphs are the same, the GNN won’t be able to distinguish them, they will get the same embeddings, since it doesn’t care about node IDs

18
New cards

Injective function

It maps different elements into different outputs
→ retains all the information about the input

<p>It maps different elements into different outputs <br>→ retains all the information about the input</p>
19
New cards

Most expressive GNN

Maps rooted subtrees to node embeddings injectively.
→ use an injective neighbor aggregation to fully retain the neighboring information and therefore distinguish different subtree structures

20
New cards

Expressive power, GCN and GraphSAGE case

Expressive power of GNNs can be characterized by that of neighbor aggregation function.

Neighbor aggregation is equivalent to a multi-set function

GCN:

  • element-wise mean pool

  • failure case: multi-sets with same color proportion

GraphSAGE:

  • element-wise max pool

  • failure case: multi-sets with same set of distinct colors

21
New cards

Injective multi-set function

knowt flashcard image
22
New cards

Universal approximation theorem

knowt flashcard image
23
New cards

Graph isomorphism network (GIN)

A differentiable neural network version of the WL graph kernel:

  • GIN uses a neural network to model the injective hash function

  • GIN’s update targets are the node embeddings (low-dim vectors), not the one-hot node colors

=> their expressive power is the same, they can distinguish most of the real-world graphs

Advantages:

  • Due to the low-dimensional node embeddings, they are able to capture fine-grained node similarity

  • Parameters of the update function can be learnt for the downstream tasks

<p>A differentiable neural network version of the WL graph kernel: </p><ul><li><p>GIN uses a neural network to model the injective hash function</p></li><li><p>GIN’s update targets are the node embeddings (low-dim vectors), not the one-hot node colors</p></li></ul><p>=&gt; their expressive power is the same, they can distinguish most of the real-world graphs</p><p>Advantages:</p><ul><li><p>Due to the low-dimensional node embeddings, they are able to capture fine-grained node similarity</p></li><li><p>Parameters of the update function can be learnt for the downstream tasks</p></li></ul><p></p>
24
New cards

Injective function of GIN

Uses element-wise sum pooling instead of mean/max

<p>Uses element-wise sum pooling instead of mean/max</p>