WOLFRAM NOTEBOOK

Artificial Neuron

The artificial neuron is the basic building block of artificial, convolutional, and other popular neural networks, and forms the basis of most of modern deep learning.

Modeling the Artificial Neuron

Historically, the artificial neuron is supposed to have developed from the biological neuron. Let’s try to visualise that.
Pull up a couple of images of a biological neuron:
bioNeuron=WebImageSearch["Neuron","Thumbnails","MaxItems"2]
,
We can see the main parts of this structure seem to be: dendrite, axon and axon terminal.
Now let’s pull up a couple of images of an artificial neuron:
artificialNeuron=WebImageSearch["Artificial neuron","Thumbnails","MaxItems"2]
,
The first image of biological neuron, and the second image of artificial neuron look similar. So let’s compare them.
bioNeuron[[1]]artificialNeuron[[2]]
So the association seems to be:
Dendrite Input
Axon Net Input
Axon terminal Activation (Output)

Nuts and Bolts of the Neuron

Now let’s try to visualise what happens at each of the 3 parts identified above.

Dendrite

The dendrites take input via a vector product. Let’s visualise vectors and their products:
x=RandomInteger[10,{5}]
{3,6,9,10,3}
w=RandomInteger[10,{5}]
{8,7,9,7,0}
x.w
146
Let’s visualise how the values of the individual vectors have changed after multiplication.
GraphicsRow[{MatrixPlot[{x},PlotLabel"x"],MatrixPlot[{w},PlotLabel"w"],MatrixPlot[{{x.w}},PlotLabel"x.w"]}]
In the analogy of the biological neuron, the value of x.w tells us how intense the inputs collected from the dendrites are (by getting scaled up by the weight vector w), once they have entered the neuron. The higher the value, the more intense the input.

Axon

Looking back at the picture of the artificial neuron, we see that this part basically sums up a lot of vector products.
x1=RandomInteger[10,{5}]w1=RandomInteger[10,{5}]x2=RandomInteger[10,{5}]w2=RandomInteger[10,{5}]
{10,10,6,1,8}
{2,1,10,1,7}
{0,9,3,5,3}
{2,8,8,0,3}
net=x1.w1+x2.w2
252
ImageCollage[{MatrixPlot[{x1},PlotLabel"
x
1
"],MatrixPlot[{x2},PlotLabel"
x
2
"],MatrixPlot[{w1},PlotLabel"
w
1
"],MatrixPlot[{w2},PlotLabel"
w
2
"],MatrixPlot[{{net}},PlotLabel"net"]}]
The value of the activation (output) of the neuron tells us how intense a signal the neuron is sending to the subsequent neurons in the network.

Axon Terminal

This part uses something called an activation function. An activation function introduces non-linearity into the values of the neuron. Let’s look at some activation functions.
Plot the sigmoid function:
Plot[LogisticSigmoid[x],{x,-10,10}]
Plot the tanh function:
The tanh functions seems to be steeper than sigmoid, and is hence closer to the discontinuous function plotted below.
The above functions are basically differentiable approximations for the following function:
The above function is useful for binary classification (0 being one class, 1 being another). We use the above functions instead because they are differentiable and allow us to use our good friend calculus.
We’ll look at some other popular activation functions:

Applications

You know what’s cooler and more powerful than a single neuron? A network of neurons! Also known as a neural network.

Feedforward Neural Network

Let’s visualise a typical neural network called a feedforward neural network:
From the above graph, we can easily see how each neuron takes inputs at its dendrites, accumulates them into its axon, and outputs activations at its terminal. An edge between 2 neurons represents flow of information between them.

Convolutional Neural Network

Now we’ll see a CNN, which is very popular for image processing.

First let’s see what a convolution is:
That looks intimidating! Let’s look at a visualisation of that:
The output in red is the convolution of the 2 functions shown in blue and yellow. Convolution is basically a product of 2 functions. Feel free to play around with the slider and understand what is going on.
Now let’s see how convolution is used inside a CNN. Let’s make a random matrix first:
I am initializing the convolution matrix randomly here for purposes of visualization. See the animation below for details.
In reality, the convolution is calculated by running a tiny kernel (the 3x3 square frame below) over the input matrix, and mapping the set of numbers within the kernel to a part of the output (the scalar within the frame in the output).
One of the most important features of a CNN is that when the kernel moves around over the input, it always uses the same weights to map that region of input to the output.
Apart from the convolution, a CNN has basically the same structure as a feedforward neural network visualised above.

Further Explorations

Backpropagation
Training a neural network
Recurrent neural network, Long Short Term Memory (LSTM)

Authorship Information

Rohan Saxena
23rd June, 2017
saxenarohan97@gmail.com
Wolfram Cloud

You are using a browser not supported by the Wolfram Cloud

Supported browsers include recent versions of Chrome, Edge, Firefox and Safari.


I understand and wish to continue anyway »

You are using a browser not supported by the Wolfram Cloud. Supported browsers include recent versions of Chrome, Edge, Firefox and Safari.