Neural Networks

Theory

  No advanced math is needed for this page, but when we get to training our neural network we will need calculus and linear algebra. If you are rusty on these subjects, I highly recommend checking out 3Blue1Brown on youtube. He explains these concepts in an intuitive manner with incredible visuals.

  I often find that it is very benificial to just identify exactly what it is you want to do, without even the slightest thought of how you will do it. This allows you to identify the problem without your subconscious being overwhelmed by the thought of how. Then after you know what it is you want to do, or you know where it is you are going, you can focus on the how. Keep this in mind when learning this new concept of neural networks.

The tale of the investor, the mathematician, and the neuroscientist

  Say there is a real estate investor that has data on thousands of properties around the world. This data includes each properties location, location crime rate, property square foot, and population. Also, he has the price that each property was sold for. He wants to find the relationship between the data for each property and the price the property was sold for. If he finds this relationship, he can apply it to a new house that isn't in his data and find out the price it will sell at. Then he can easily find houses that are going for cheaper than what he can resell them for.

  Okay so that's the aspirations of a real estate investor. Unfortunately for him, he has never learned calculus. So, with riches on his mind, he finds a mathematician to solve his problem. Let's look at this problem from the mathematician’s perspective.

  To the mathematician, we have a series of inputs and outputs of some function. The inputs being the properties location, location crime rate, property square foot, and population. The output is the price the property sold at. This function is the "relationship" that the investor was referring to, but what the mathematician sees is:

func
Price is some function of the inputs. This function may be simple or incredibly complicated, we just don't know!

  As the mathematician works tirelessly to find this function, the investor grows impatient. And after a couple days he loses his patients and becomes business partners with another, more experienced, investor instead of waiting on the mathematician. This more experienced investor is so good at what he does, that he can show up to the property and give a great estimate of what the property can sell for without even using grade school math.

  The mathematician, embarrassed, gives up on manually finding this function. Defeated, he asks himself, "How is this problem so difficult to solve using mathematics, but so intuitive to the experienced investor?". Unable to answer this question, he finds a neuroscientist in hopes to understand how the investors brain solves the problem.

  The neuroscientist, unable to answer exactly how the investor is estimating prices so well, teaches the mathematician how the neurons in the brain work. The mathematician then models these neurons mathematically. After years and years of hard work, this mathematician finally finds out how he can find the relationship between any 2 related things, including the details of a property and the price they sell at. He then uses this information to make a hundred million dollars and lives happily ever after.

Modeling the biological neuron

  As we have seen from the tale above, if we want to model things that are so intuitive to humans but incredibly complicated to model mathematically, we must look at how the human brain does it.

  The human brain is made of billions of neurons. A simplification of a neuron is this; many signals of varying magnitude enter the neuron, and if the total incoming signal is strong enough (it's greater than some threshold that this individual neuron can handle) the neuron will fire and output another signal. This outputted signal is the input to another neuron, and the process continues. See the figure below to visualize this:

neuron
I omitted all the stuff us engineers don't care about.

  So how can we model this mathematically? The neuron is receiving many inputs of varying magnitude, the total signal received can be modeled as:

weighted inputs
These W variables are called weights. They add "weight" to the input signals to change their magnitude

  Okay, now if the total incoming signal is greater than this neurons threshold the neuron fires. Or mathematically:

bias
This threshold is called a bias and labeled B. If the sum of the weighted inputs is greater than B, this will output a 1. If they are less than B, this will output a 0.

  Before we call this a complete model of the neuron, we must address one thing. It is beneficial to get rid of the "if" statement in the above equation. We do this by passing the left side of this equation into what is called the sigmoid function. Let's look at a graph of what we currently have vrs the graph of a sigmoid function.

sigmoid function
The sigmoid function is an example of an activation function. It is currently most popular to use the ReLu function instead, but the sigmoid function makes it easy to understand how they thought of the idea in the first place due to its similarity to the step function

  As you can see the sigmoid function behaves almost exactly like the step function we had before. The only difference is we don't have to deal with that ugly if statement. The function is also differentiable now, which is useful for reasons I will explain shortly. Note that we can use any differentiable function here, just some work better than others. Soon, we will use the 'identity' function here (so basically there is no function here) for an example.

  Below is our final mathematical representation of a neuron.

final representation
Z = the weighted sum of all inputs - the neurons bias. We then pass Z through the sigmoid function to get our output.

  We can now update our picture of the neuron to fit our mathematical model:

updated picture
f(z) is any activation function, but we will be using the sigmoid function.

Combining multiple neurons

  Now we understand the workings of one neuron, and we have modeled it mathematically. As I said before the human brain consists of billions of neurons. These billions of neurons all send and receive signals to and from other neurons, forming a neural network.

  Let's zoom back down on one neuron. This neuron receives a bunch of inputs, then sends a signal to one or more other neurons. These neurons all do the same, receive a bunch of inputs from neurons then send a signal to more neurons. A brain can be thought of a big complicated tangled up (gross) web of connected neurons.

  When we model a neural network mathematically, we bring some order to this web. Most often, we organize the neurons in layers. Each layer receives input from the previous layer and send output to the next layer. We make the last layer (the output layer) have an amount of neurons = the number of outputs we want. Below is an example of an artificial neural network (a mathematical model of a neural network), with 2 outputs and 3 inputs:

naked neural network
The weights can be seen as a matrix as we will see in a bit

  Going back to our real estate investors problem, the data he has on the property would be the input, and the price of the property would be the output. Recall that he had 4 total inputs and wants those inputs to tell him the price. Now, ask yourself what the number of neurons should be in the first and last layer.

  Your answer should be 4 in the first layer and 1 in the last layer. But, how many neurons should we put in between? And how many layers should we have? This is where the exact mathematics stops and trial and error comes in. We call these values hyper parameters. The optimal number of layers and number of neurons in each layer is different for every problem, it is just something you have to test. Generally, increasing the number of layers and neurons in each later will make the neural network more robust.

Fundamental idea behind artificial neural networks

  So, we have a mathematical model of a neural network. How do we actually make this thing work? How do we make this network actually give us the output we are looking for, given the inputs? Do me a favor and put your brain in sponge mode for just a second and think about the very bold statement below.

  If given enough information in the inputs and given a robust enough architecture, there exists some set of weights that will make the neural network give us the output we want. Our goal is to find these weights.

  We can say that for any 2 related things, there is some complicated function that models their relationship. This is the function the mathematician was trying to find by hand. The 2 related things being all the data on the property and the price the property sold at. It turns out that if we have the correct set of weights in our neural network, we can replicate this relationship. Give it a go, try to find a set of weights that gives the correct output for each input below (I promise they exist). You can tell how wrong the current output is by looking at the error, the higher the error the worse your guess at the weights it.

For simplicity sake, I used the identity activation function below. The error function is explained in much more detail in the next page. We are trying to map the inputs in the first column of table, to the outputs in the second column.

W1:

W2:

W3:

Snow
table of inputs and outputs

Output:

Expected:

Error:

Finding a set of weights for one input is easy, we need to find a set of weights that works for all inputs.

  Give up? That's ok, finding the weights by hand is the neanderthal way of doing things, this is 2020 and we have technology! With some calculus and code, we can find these weights in the blink of an eye. The process of finding these weights is where neural networks and artificial neural networks take their separate paths however. I'm not sure if it's known how real neural networks learn; but using calculus and data we can make our artificial neural networks find the weights that will map an input to an output if possible. So, if you want to learn more, join me in learning how we can train a neural network.