t-SNE in Python (part 2)
As promised, in this post I’ll talk about my implementation of the t-SNE dimensionality reduction algorithm in Python. I’ve made a version that explicitly calculates the gradient of each vector $Y$ in the reduced dataset, and another version that employs Theano’s grad function. The Numpy version was a bit tricky to implement. The Theano version, for once, was actually easier than the Numpy version, because all you do is just slam in T.grad. I’ll start with the Numpy version, and then move on to the Theano version. Before we jump into any code, let’s state the the gradient of the cost function:
\[\frac{\partial Cost}{\partial y_{i}} = 4\sum_{j} (p_{ij} - q_{ij})(y_i - y_j)(1 + ||y_i - y_j||^2)^{-1}\]Say you’ve stored your high dimensional vectors in some variable X
, and
the initial guesses for the low dimensional vectors in some variable Y
.
We can calculate the joint probability qij as follows.
The cost is pretty simple as well. Here I assume that you have a function that calculates pij given X. Now, you wouldn’t actually calculate pij everytime you call this function, because you only have to do it once at the beginning, but this is just for illustation purposes…
Now comes the gradient of the cost function. This is a little trickier.
To pull this off efficiently, you have to do some wizardry with Numpy. Now watch the same thing, but in Theano. Here I assume that pij is already stored in an array.
Whaaaaaaa??? Take 11 lines of code and turn it into 2? Theano, you’ve stolen my heart! The crazy part about this is that it even runs a little faster than the Numpy version. As always, there is some overhead (around 3 seconds on my machine) to compile the function that computes the gradient. When ‘training’ the t-SNE algorithm however, the Theano version is about 1.5x faster, so you quickly make back this time.