Introductory Example of a Neural Network

Project Overview

  Now that we have an idea about the theory behind training a neural network, we need to focus on how to implement this theory. In the previous page (linked to in last sentence) I gave an example of some inputs and outputs and laid out the game plan for finding their relationship. If you haven't already, go back to that page and play with the interactive neural network and try to find a set of weights that gives a small error for ALL of the inputs.

  It's tough, almost impossible, to find the correct weights manually (without treating this as a system of linear equations). But by using the gradient descent algorithm and our error function that we derived earlier, we will do this the right way and find these weights with code.

Finding the gradient of our function

  Right now, this is our neural network from input to error. Remember, for learning sake we are ignoring the bias. We will add it in our next project.

review

  We want to find the derivative of the error with respect to each weight (how much a small change in each weight changes the error), then move that weight a tiny amount in the direction of the negated derivative (or the direction of steepest descent). We can do this using the chain rule as shown below;

chain rule
The chain rule looks scary, but it is simple division. Notice that you can cancel the ∂Z's just like if you were dealing with fractions (you basically are). Another way of thinking about it is; if you want to find how much a change in the weight affects the error, you first need to find how much it affects Z, then you need to find how much Z affects the error. Also, that predator 3 dot symbol means "therefore".

  That's it, we found our gradient! Now we just need to iteratively apply the gradient descent update to make our weights converge. Here is the code to do this:

gradient is direction of steepest ascent
Reminder of gradient descent
import numpy as np

""" x is all of our inputs, y_true is our expected outputs. """

x = np.array([[3,4,7],[6,2,9],[8,10,12],[9,7,4],[7,6,3]])
y_true = np.array([13.5,15.6,27.4,15.6,12.5])

""" My guess at the weights, you can guess absolutely anything, just make sure they are floats """

w0 = np.array([200.0,-14.0,3456.0])

""" We update our weights 1000 times, each update we grab the gradient for each input, average all the gradients, then update the weights (-= makes sure we move in the direction of -gradient) """

for n in range(1000):
  gradients = [[],[],[],[],[]]
  for i in range(5):
     z = np.dot(w0,x[i])
     gradients[i] = -2*(y_true[i]-z)*x[i]

  average_gradient = np.average(gradients,0)
  w0 -= .001*average_gradient

print(w0)

  Updating the weights 1000 times is usually enough, if you guessed ridiculous weights that were in the tens of thousands or higher, it may take more. Take a look at the weights you ended up with after running this code, did you get [.5,.9,1.2]? Well that's the exact weights I used to generate this data.

  Isn't this incredible?! By using calculus and data we can approximate the relationship between any 2 related things! You might say, "well this is an easy example, I could have solved this in 2 minutes treating it as a system of linear equations". Well, you're exactly right. But what if there were 5 or 6 layers each with 1000 + neurons that each used the sigmoid activation funciton. Solving by hand in any way is out the window. What we need is a systematic algorithm that can find the gradient for all the weights in our network.