-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathnode2vec.py
129 lines (93 loc) · 4.63 KB
/
node2vec.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
# -*- coding: utf-8 -*-
"""node2vec.ipynb
Automatically generated by Colaboratory.
Original file is located at
https://colab.research.google.com/drive/1EhlcEiJ1M4ZlHEnxIjNX_B1ycXFFRPT_
# node2vec
### Setup
First of all, I will install the [graph2vec library](https://github.com/VHRanger/graph2vec) which offers a fast implementation of the node2vec method.
"""
!pip install nodevectors
"""I now import the library, and create a small wrapper class which will expose only the few hyperparameters I will need to tune in this Colab"""
import nodevectors
import networkx as nx
class Node2Vec(nodevectors.Node2Vec):
"""
Parameters
----------
p : float
p parameter of node2vec
q : float
q parameter of node2vec
d : int
dimensionality of the embedding vectors
"""
def __init__(self, p=1, q=1, d=32):
super().__init__(
n_components=d,
walklen=10,
epochs=50,
return_weight=1.0/p,
neighbor_weight=1.0/q,
threads=0,
w2vparams={'window': 4,
'negative': 5,
'iter': 10,
'batch_words': 128})
"""Lastly, let's import some of the common libraries needed for the task."""
# Commented out IPython magic to ensure Python compatibility.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# %matplotlib inline
# Load the Zachary's Karate Club as a NetworkX Graph object
KCG = nx.karate_club_graph()
# Fit embedding model to the Karate Club graph
n2v = Node2Vec(1, 1, 2)
n2v.fit(KCG)
embeddings = []
for node in KCG.nodes:
embedding = list(n2v.predict(node))
club = KCG.nodes[node]['club']
embeddings.append(embedding + [club])
# Construct a pandas dataframe with the 2D embeddings from node2vec,
# plus the club name that each node belongs to after the split
df = pd.DataFrame(embeddings, columns=['x', 'y', 'club'])
# Nodes who stayed with the Mr. Hi will be plotted in red, while nodes
# who moved with the Officer will be plotted in blue
colors = ['red' if x == 'Mr. Hi' else 'blue' for x in df.club]
df.plot.scatter(x='x', y='y', s=50, c=colors)
"""If my example trained correctly, We may notice a clear separation between the blue and red nodes. Solely from the graph structure, node2vec could predict how the Zachary's Karate Club split!
Tune the hyperparameters ```p``` and ```q```, and notice how they affect the resulting embeddings.
### Building node2vec
Now we will study the behavior of node2vec on [barbell graphs](https://en.wikipedia.org/wiki/Barbell_graph).
Below we can see a toy example of a barbell graph generated with NetworkX.
"""
toy_barbell = nx.barbell_graph(7, 2)
nx.draw_kamada_kawai(toy_barbell)
"""
```
# This is formatted as code
```
Generate a larger barbell graph, where each complete graph has exactly 1000 nodes, and the path length between the complete graphs is equal to 1 (i.e., all the nodes in the barbell graph belong to either one of the two complete graphs, and the connecting path does not have any internal node).
Then, learn node2vec embeddings on this graph, setting ```p = 1, q = 1``` and ```d = 10```."""
barbell = nx.barbell_graph(500, 0)
n2v_barbell = Node2Vec(1, 1, 10)
n2v_barbell.fit(barbell)
"""Now, I am going to write a function that takes as input a node id ```n``` in the graph (e.g., ```5```) and returns a list containining the cosine similarity between the node2vec vector of the input node ```n``` and all the nodes in the given barbell graph (including the similarity with ```n``` itself)."""
from numpy.linalg import norm
def get_similarity(n, g=barbell, embed=n2v_barbell):
cos = []
e = embed.predict(n)
for node in barbell.nodes:
e_n = embed.predict(node)
cos.append( (e @ e_n) / (norm(e) * norm(e_n)))
return cos
"""Now, I am going to generate another barbell graph, this time adding a path of length 51 between the two complete graphs. To find out how, refer to the NetworkX documentation: [https://networkx.github.io/documentation/stable/reference/generated/networkx.generators.classic.barbell_graph.html#networkx.generators.classic.barbell_graph](https://networkx.github.io/documentation/stable/reference/generated/networkx.generators.classic.barbell_graph.html#networkx.generators.classic.barbell_graph)
I am using the node2vec embeddings for the nodes of this new graph, using the same hyperparameters as before.
"""
barbell_51 = nx.barbell_graph(500, 51)
n2v_barbell_51 = Node2Vec(1, 1, 10)
n2v_barbell_51.fit(barbell_51)
plt.hist(get_similarity(5))
plt.hist(get_similarity(5, barbell_51, n2v_barbell_51))