Deep Learning is the use of “deep” Neural Networks as a tool in Machine Learning. (If you are unsure of what ML is please see my previous post)
This type of approach (as the name implies) is loosely modeled after human neurology. Let’s start by examining one of the simple problems for humans, but up until a few years ago was nearly impossible for a computer to do with any reasonable accuracy.
Recognizing handwritten digits may seem simple at first, but writing code to tell the difference between a 3 and 5 isn’t as simple as: if(x!=y)
Neural networks are a set of algorithms, modeled loosely after the human brain, that are designed to recognize patterns.
**These patterns are the similarity between all the 3s**
There are 3 main parts to this neural network, the input layer, the hidden layer(s), and the output layer.
The Input Layer in which the original image is ‘inputted’ into the network. There are two hidden layers that help process the image and the final layer which makes the final decision.
The first layer takes in the images as a vector of pixels: see the animation below.
Then (in theory) the next layer looks for edges by weighting the pixel values, in order to have a higher value if an edge exists, and a lower value if that edge doesn’t exist.
This is done by having a negative value for w for all pixels where no pixels should exist if there was an edge, and positive value pixels where a pixel should exist if there were are an edge there. This weighted sum would give a positive value if an edge was present and a negative value is no edge was detected.
This weighted sum could fall anywhere on the number line. We would like these sums to be a value between 0 and 1. In order to be sure only the presence of an edge or not is the only thing affecting the next layer, we pass the weighted sum through an Activation Function.
A similar process is done for the following layers, until the final layer where we normalize the values into probabilities of each number from zero through nine.