Due to my current work, I need to review some graphs from time to time. This post is a collection of methods and utilities which are helpful here.

t-SNE

Of course, I need to start with this because t-SNE is a very commonly used review methodology - not only for graphs. t-SNE is an acronym for Distributed Stochastic Neighbor Embedding. The technique was created by Laurens van der Maaten to provide a way to visualize high-dimensional data in 2008 - which makes it a bit obvious that every researcher dealing with machine learning should have used this technique [1][2].

$$ \begin{equation} {\displaystyle p_{j\mid i}={\frac {\exp(-\lVert \mathbf {x} _{i}-\mathbf {x} _{j}\rVert ^{2}/2\sigma _{i}^{2})}{\sum _{k\neq i}\exp(-\lVert \mathbf {x} _{i}-\mathbf {x} _{k}\rVert ^{2}/2\sigma _{i}^{2})}}} \end{equation} $$

This technique embeds multi-dimensional data points into a two-dimensional map by maintaining the relative distances between each data point. Therefore, a use case is the review/detection of clusters in a given dataset.

The algorithm is about the attraction of data points. $Equation 1$ depicts the formula to calculate the probability of a datapoint $x_i$ could be exchanged by $x_j$. This probability could be seen as a similarity between data points. $\sigma_{i}^{2}$ is the Gaussian variance and normalizes the probabilities. The similarity is calculated for all data points expect $p{i|i}$, because the similarity with a datapoint to itself is not interesting in this case and therefore defined as $0$. Also, $p_{i|j}$ is not the same as $p_{j|i}$ because each data point has a different neighbourhood. To overcome this, we use this equation:

$$ \begin{equation} {\displaystyle p_{ij}={\frac {p_{j\mid i}+p_{i\mid j}}{2N}}} \end{equation} $$

...where $N$ is the dimension. To map those similarities to the target plot, we project the similarities from the target plot with this function:

$$ \begin{equation} {\displaystyle q_{ij}={\frac {(1+\lVert \mathbf {y} _{i}-\mathbf {y} _{j}\rVert ^{2})^{-1}}{\sum _{k}\sum _{l\neq k}(1+\lVert \mathbf {y} _{k}-\mathbf {y} _{l}\rVert ^{2})^{-1}}}} \end{equation} $$

$Equation 1$ uses Gaussian distribution, whereas $equation 3$ uses t-distribution. the t-distribution falls fast and has a long tail, which helps to overcome crowding. Essentially $q_{ji}$ should be as close as possible to $p_{ji}$. The difference of this is calculated using the Kullback–Leibler divergence, which measures the statistic difference between two probability distributions:

$$\begin{equation} {\displaystyle \mathrm {KL} \left(P\parallel Q\right)=\sum _{i\neq j}p_{ij}\log {\frac {p_{ij}}{q_{ij}}}} \end{equation}$$

We are almost there. We now train a model to find the smallest $\mathrm {KL} \left(P\parallel Q\right)$, mainly using gradient descent. This post is a very high-level explanation. Additionally, you might take a look at this video, which explains the concept more graphically:

References

  1. https://lvdmaaten.github.io/tsne/
  2. Wikipedia, https://en.wikipedia.org/wiki/T-distributed_stochastic_neighbor_embedding

UMAP

Coming up next

Celluloid: Record Graph developments

Sometimes, it might be handy to observe the plotted graph during training. Either you create multiple plots or straight a video. Using Celluloid this is just a task of a few lines:

from celluloid import Camera
# your plot
fig, ax = plt.subplots(figsize=(12, 7))
# start a record
camera = Camera(fig)
# start training
for epoch, loss, acc, pred in train(model=model, criterion=criterion, optimizer=optimizer, epochs=1000):
  nx.draw_networkx(karateclubgraph, pos=coords_as_tuple, cmap=plt.get_cmap('tab10'), node_color = node_labels, node_size=75, linewidths=6, ax=ax, with_labels=False)
  camera.snap()
        
animation = camera.animate(blit=False, interval=150)
animation.save('./graph.mp4', fps=60)
Code Excerpt from Thomas Kipfs GCN example

The result might look like this:

0:00
/
GCN of Kipf and Welling with Zachary's karate club