I've been taking a lot of rides since I got back from my trip. I think I've been cranking out 5 or 6 rides a week, which is more than I anticipated. I figured that after riding across the country, I'd want to be done riding for a while. Not true! I've been loving riding in the Iowa City area. The countryside is beautiful. Unlike riding in the twin cities, I can be out on some rural road in less than 15 minutes from my house. Contrary to popular belief, Iowa is not entirely flat. I ride a lot of rollers around here. Nothing crazy, but at least the terrain is varied. Nothing like west-central Kansas, which is flatter than a damn pancake. Today Tom and I went out to a little town called Oxford, about 20 miles north and west of where we live. Coming back, we had a monster tail wind, some nice rollers, and a beautiful tarmac road. We passed the occasional farm house, but corn and soybeans were our constant companion. In the wind, the soybean fields ripple, as if a wave is passing through them. It makes me feel like I'm riding on some causeway through a large body of water.



I did my first bike race last weekend. It was lots of fun. I was super nervous, as a couple of people had told me about crashing during the same race last year. I'm really scared of crashing. I've taken a couple of spills since I've been back, but mostly on my touring bike while riding the mountain bike trails at Sugarbottom. Crashing at 5 mph sucks, so I can only imagine how bad it could be at 20+mph. As soon as I started the race, my fears of crashing evaporated. We were cruising! The race was only 32 miles, but we averaged a little more than 23 mph, which is damn fast. Riding fast in a group is amazing. Occasionally the group would settle in to an easy pace at around 20 mph, and then it would surge ahead to 30 mph. It felt like everyone was antsy and full of testosterone. The last 200m were just a sprint. I finished 9th out of 18, but the top 10 or so people must have finished within a couple bike lengths of each other.

I've been doing some research on the data analysis techniques I'll be using for my senior thesis. While I'm not entirely sure what the exact nature of my project will be, but I know that I'll be building an unsupervised neural net to analyze data from a dark matter detection experiment. I've been spending most of my time looking into supervised neural nets, just to get an idea what the area is about. My only other experience with machine learning is doing principal component analysis (PCA) for face recognition. The term "eigenface" comes from this technique. Basically the idea is to take the eigenvectors of the covariance matrix for all the images in your training set. Faces can be reconstructed using some linear combination of these eigenvectors (making the technique potentially useful for image storage). The issue with this particular technique is that it doesn't deal with image translation well. PCA in general is a useful way to reduce the dimensionality of a problem. It can be computationally expensive, as you have to compute the eigenvectors of a matrix whose dimensions are equal to $m*n\:\times\:m*n$ where $m$ and $n$ represent the dimensions of the original images. Hit me up if you want to see some code, or a paper on "eigenfaces."

Neural Nets represent another way to reduce the dimensionality of a dataset, like PCA. The approach is quite a bit different however. A basic neural net is comprised of some number of layers of "neurons." These neurons are in practice just some activation function. Each neuron takes an input and an output. I've been using the sigmoid function for this purpose. It looks as follows.
$$ \sigma(z) = \frac{1}{1+e^{-z}} $$
We can adjust the output of these activation functions by fiddling with the "weights" of the function, or the "bias" for that layer.
$$ \vec{z} = (W\cdot \vec{x}) + \vec{b}$$
Note that I've introduced vectors here, to indicate that we'll be doing all the activations in a layer at once. The examples I've been using have to do with recognizing handwritten numbers. Say, for example, that each image is 20 x 20. The number of inputs to the net will be 400. The number of outputs will be 10, each one corresponding to a digit. Say that I input an image that corresponds to the number 5. My fully trained neural net will output a vector whose 6th component (as we start with zero) is 1, while the rest are zero. The net could have multiple layers in between the input and output, but I've been using one layer, comprised of anywhere from 30 to 200 neurons. Calculating the output of the network is a matter of feeding the input through the network. For instance, if I'm moving from the input layer to the middle layer (with 50 neurons), my input vector will be the image in question, flattened into a one-dimensional vector. Applying the weight matrix will result in a vector with 50 entries. This means the weight matrix will be a 50 x 400 matrix. The bias vector will have 50 entries.
$$ \begin{bmatrix} z_1 \\ z_2 \\  \vdots \\ z_{49} \\ z_{50} \end{bmatrix} =
\begin{bmatrix} w_{1,1} & w_{1,2} & \dots & w_{1,400} \\
w_{2,1} & w_{2,2} & \dots & w_{2,400} \\
\vdots & \vdots & \ddots & \vdots \\
w_{50,1} & w_{50,2} & \dots & w_{50,400}
 \end{bmatrix}
\cdot
\begin{bmatrix} x_1 \\ x_2 \\  \vdots \\ x_{399} \\ z_{400} \end{bmatrix}
+
\begin{bmatrix} b_1 \\ b_2 \\  \vdots \\ b_{49} \\ b_{50} \end{bmatrix}
$$
We then plug the $\vec{z}$ vector into the activation (sigmoid) function. Getting the output of the network is just a matter of taking the output of the hidden layer and using it as the input for the last layer! In python, we might do this as follows:
Note that 'self.W' variable is a list containing the weight matrices. The same goes for the 'self.b' variable. The softmax function is normally used as the activation function for the final layer of the neural net. For a single value $z_i$, a component of a vector $\vec{z}$ of length $K$ it looks as follows (from wikipedia)
$$ \sigma(z_i) = \frac{e^{z_i}}{\sum_{j=0}^{K} e^{z_j}}
$$
You might be asking yourself, "Dean, what is the 'theano' business?" Theano is a Python library for doing wicked fast math, kind of like NumPy. It has a bunch of useful stuff that's just for making neural nets, however. Notice, for example, the built in "sigmoid" and "softmax" functions.

The hard part of making a neural net is optimizing it. I need to find weights and biases such that my net can correctly identify all the images in a training set. With these optimized weights and biases, I can then correctly identify images in a set that the net has never seen before. In my code, as in the Theano tutorial, I use a method called stochastic gradient descent. Stochastic because we train the net with random slice of the images in the training set (instead of training one by one). This reduces the amount of computation we have to do, because we're taking advantage of the powerful linear algebra tools built into Theano and NumPy. Gradient because we're using derivatives to find local minima of a cost function, which relates the output of the neural net to the correct classification. Descent because we're always moving our way down towards the local minima by iteratively subtracting gradient terms from our weights and biases. While I spent some time the other week writing some code to actually calculate gradients, I started using Theano because it does it for me. Instead of writing 10s of lines of code to calculate derivatives, I just use Theano's built in "T.grad" function. Wow. Check out the full code here. Note that I use the $tanh$ function as my activation function in my code. Additionally, I employ L1 and L2 regularization to make the optimization process faster.