Neural networks power some of the most innovative applications of our day. Amazon Alexa, Google Translate, and other “intelligent” technologies (like self-driving cars) use neural networks that mimic the human brain. These networks consist of a finely-balanced combination of mathematics, programming, and design thinking that works behind the scenes of these applications. It should come as no surprise that the type of architecture that’s used to power these networks is crucial. In this post, we will explain how to get started with neural network architectures as well as how to optimize them.
A neural network is made up of neurons that are organized in layers. There are three types of layers: an input layer, an output layer, and a hidden layer.
In most cases, there will be multiple hidden layers in a neural network. Networks that don’t have these hidden layers are called single layer perceptrons.
The neurons in the input layer receive the input objects. For example, if the input is an image, then the input objects might be pixels that are converted to a number based on the grayscale. When a neuron is activated, it “fires,” which in turn activates the neurons in the next layer. Every neuron in one layer passes an output to the neurons in the next layer. This output is defined by two factors: weight and bias.
The “weight” defines how important a particular input is to the next neuron, and it also directs the flow of values from input to output. (In the diagram above, the black lines represent weights.) On the other hand, “bias” is an added constant value that defines how easy it is for a neuron to get fired.
The process of sending data from one layer to the next is called propagation. There are two types of propagation: forward propagation and backward propagation. In forward propagation, the data moves from input to hidden layer to output. It ends in a prediction based on the input, which can be accurate or inaccurate.
In backward propagation, a prediction from the output layer is back-tracked from the output to the input layer, which shows the error rate. This is then used to modify the weights and biases of each neuron, giving the neurons with a higher error rate and greater adjustment. It is important to constantly readjust the weights to minimize errors and gain higher accuracy.
So how do you find the right architecture for your neural networks, and how many layers does your architecture need?
To be clear, there are numerous types of architectures in neural networks. In fact, new designs tailored to certain use cases are proposed every few months. Andrew Tch has compiled an extensive list of architectures, but even this isn’t all of them.
Since prediction is at the heart of neural networks, there is no one-size-fits all architecture solution. It’s best to start simple: estimate how many layers you might need (maybe just one hidden layer), and then keep the layers the same size as you expand for the sake of simplicity.
Different types of layers are better suited to different types of tasks. For example, dense layers simply connect input with output layers, convolutional layers are used for processing image data, and recurrent layers are great for time series data and audio (such as speech recognition). You can find a comprehensive list of the various types of layers in a neural network here.
The key is to get started quickly and then adjust weights to optimize for more accurate outputs. You can adjust the weight at the end of each batch (known as learning rate optimization), and you can also change how much influence the errors from the previous batches have on the current one (which is called the momentum).
In addition, you can use algorithms to tune your neural networks, including gradient descent or stochastic gradient descent, as well as adagrad, adam, and more. You can read Sebastian Ruder’s useful overview of algorithm optimization here.
One frequently used method of optimizing neural networks is called dropout. This reduces the problem of over-fitting, where statistical noise enters a neural network that is too large for a small data set. Dropout works by randomly “dropping out” certain outputs from a layer, which makes the previous layer look like it has fewer neurons. This reduces noise and improves the accuracy of the neural network. The exact amount of dropout you need will vary based on the dataset and the architecture of the neural network.
Neural networks are at the center of the current AI revolution. While they present a huge opportunity, designing accurate and efficient neural networks is quite a challenge. In this post, we discussed central aspects of neural networks, including propagation, architecture, layers, and optimization. While there is much more to learn, this guide will give you a framework for thinking about designing and scaling your own neural network.