top of page

How to start with Neural Networks

Updated: Aug 5, 2021

A Neural network is the mathematical modeling that is inspired by the human brain to store information. A neural network is a learning algorithm that mimics the way of the human brain operates. As the human brain uses the system of neurons to recognize the underlying relationships in a set of data. Neural networks work like computing systems with interconnected nodes which helps neurons to recognize hidden patterns and correlations in raw data, cluster, and classify them. It continuously learns and improves over time during the training of the model.

Neural networks are used in machine learning by using layers for analyzing and learning data. Neural network algorithm generates the close and best possible results and can adapt according to changing input. The network is widely gaining popularity in the field of Artificial Intelligence. The idea behind a neural network is to simulate lots of interconnected dense layers of brain cells inside a computer, so it makes it learn things, recognize patterns, solve complexity and make decisions in a way just like a living being.

Though it does have its limitations as it is only an intelligent machine originally, that cannot replace a living brain of a human. Neural networks are software simulations that are working in a traditional pattern with a series of logical programming working in parallel.


Simple neural network signifies wide variety of complex functions and understanding the purpose of the each layer of neurons in the network. Network mainly consist of three layers:

  • Input Layer: This layer is the first and foremost input of your text, image or proper data for neural network that is to be passed on.

  • Hidden Layer

  • Output Layer: This layer simply provides the result of the neural network of the given input.

Glimpse of the python code:

Each layer is responsible for the learning of the input data to provide accurate results. The abstract representations might be difficult for humans to make sense but it works for the algorithm to classify data better. Here, we are going to use basic python programming and Keras High-Level Deep Learning API as it is easy to use and an open-source library of evaluation for deep learning of the models.

Keras is a powerful and easy-to-use free open-source Python library for developing and evaluating deep learning models. Deep neural learning is a part of machine learning that works similarly to the human brain. It is a form of neural learning, with functions that operate in a highly volatile decision-making process. Deep learning occurs when decisions are made on unstructured data without supervision. Object recognition, speech recognition, and language translation are some of the examples that involved deep neural learning.

Step1: Import Keras dataset

import keras
from   keras.datasets import mnist
(train_images,   train_labels),(test_images, test_labels) = mnist.load_data() 

Here, first we need to import Keras and then the mnist dataset to work upon. We can also load our desire dataset. MNIST contains train dataset of about 60,000 images and test dataset of about 10,000 with 28*28 pixel grayscale digits between 0 to 9. Train data used to train the model on the input data and test data used to test the accuracy of our system on the given input.

All the images or data should be in a proper pixel orange throughout the process. We need to preprocess all the images before loading the dataset.

Step2: Introducing Neural layers

from keras import models
from keras import layers
layers  = models. Sequential ()
layers. Dense (512, activation = 'relu')
layers .Dense (256, activation = 'relu'))
layers. Dense(10, activation = 'softmax'))

Our network here uses 3 dense layers which are fully connected, the second layer is called hidden layer with 256 neurons with ReLU activation function. Each score is the probability that the current image is belongs to one of the classes.

Step3: Calculation of Cost function

layers.compile(optimizer = 'rmsprop', 
 loss= 'categorical_crossentropy',
 metrics = ['accuracy'])

Here, we are calculating the loss and accuracy of our model.

Step4: Training

models .fit(train_images, train_labels, epochs = 8, batch_size = 64)

In keras, we use fit() method for training and epochs indicates the number of passes of the training dataset.

Batch size used to refer the number of training examples utilized in one iteration.

Step5: Evaluation of the network

test_loss, test_acc = models.evaluate (train_images, train_labels)
print ('Test accuracy', test_acc)

Here, we evaluate the model by testing the accuracy of the network.


Neural networks are ideally come in handy to solve complex problems or complexity in real-life situations. By learning and modeling the relationships and introducing multi hidden layers that help to solve complex and non-linear, generalizations and inferences, making predictions and model highly volatile data and also help in the prediction of rare events. So, the neural network provides good and accurate results when it comes to making predictions based on data. Neural Networks are an example of Machine Learning. It involves in various areas like:

  • Scam detection

  • Optimization of logistics data networks

  • Natural language processing

  • Medical diagnosis

  • Trading & marketing

  • Financial predictions for stock prices

  • Robotic control systems

  • Process and quality control

  • Chemical compound identification

  • Computer vision to interpret raw photos and videos


The layers involved in extracting patterns from the data have some values, called weights which carry the unique feature that helps us to approach towards the closest result. When the network is prepared then it gets initialized with some set of weights on a given training set to be trained on. The value of each weight on the layer is essential as it provides the information regarding that input. So, whenever the network is prepared to be trained on the training set, gets initialized with a set of weights.

The strength of an input layer connection can be expressed by a real number. The input via interconnects has a weight attached to it which would receive by processing elements. The input layer has a weight between 0 and 1, which will keep on updating while processing. The result value after a processing element can be expressed by the excitation level that causes interconnects to be either excitatory output (ON) or inhibitory output (OFF).

Computation of Weights:

A neuron calculates the weighted sum of the inputs.

Let the inputs as:

and weights as:

A bias (constant) is added to the weighted sum

Finally, the computed value is fed into the activation function, which resulted as an output.

Characteristics of Weights:

  • The steepness of the activation function gets increased by weights.

  • The rapidity of getting trigger by activation function is relying on weights.

  • The relationship between a feature and a target value can be extracted from weights.

  • Weights is also responsible for changing the orientation that separates the classes of data points

  • Weights provide the importance of a feature in computing the target value.

  • Weights are the co-efficient of the equation by which we try to resolve.

  • Negative weights reduce the value of an output.

The model weights are always in small positive values and the sum of all weights is always equal to one. It allows the weights to indicate the percentage of the expected performance from each model. To find the exact value of weights on each input member, the easiest approach would be to grid search values between 0 and 1 for an individual. An optimization procedure such as gradient descent optimization or linear solver can be used to estimate the weights.


Bias is just a constant that has to be added to the product of weights and inputs. It is utilized to offset the output. The bias becomes mandatory to shift the value of the output of an activation function towards the negative or positive side. Therefore, Bias is a constant which helps the model in a way that it can fit best for the given data.

Characteristics of bias:

  • The addition of bias in the product reduces the variance.

  • Bias is like an intercept added to the linear equation.

  • Bias is used to delaying the triggering of the activation function.

  • Bias is responsible for shifting the curve towards the right.

  • The bias also introduces better generalization and flexibility to the neural networks.

  • The bias is essentially the negative of the threshold which is why the value of bias controls when to activate the activation function.

  • The bias is just an additional parameter in the network that is used to adjust the output along with the weighted sum of inputs to the neuron.


The activation functions have a vital role in the designing of the whole neural network. An activation function in a neural network defines how the weighted sum of the input is transformed into an output from nodes in a layer of the network. The activation function also refers to a “transfer function” because the choice of an activation function has a grand impact on the capability and the performance of the neural network.

The choice of an activation function in the hidden layer will control how well the network model learns the training dataset. The choice of activation function in the output layer will define the type of predictions the model can make. Different activation functions are used in different parts of the neural network. All the hidden layers typically use the same activation function. The output layer will typically use a different activation function from the hidden layers and is dependent upon the type of prediction required by the model.

The activation function has the authority to decides whether a neuron should be activated or not by computing the weighted sum and further adding bias with it. The purpose of the activation function is to introduce non-linearity into the output of a neuron. The neural network has neurons that work in correspondence of weight, bias, and their respective activation function. Activation functions make the back-propagation possible since the gradients are supplied along with the error to update the weights and biases.

Non-linearity of activation function:

A neural network without an activation function is essentially just a linear regression model. The activation function does the non-linear transformation to the input making it capable to learn and perform more complex tasks.


Backpropagation tries to update the weights and biases based on the gradients it has computed against the loss. Backpropagation is the essence of neural network training. It is the method of fine-tuning the weights of a neural network based on the error rate obtained in the previous epoch. In a neural network, we would update the weights and biases of the neurons based on the error at the output. This process is known as back-propagation.

Forward propagation refers to the movement in only one direction, from input to output, in a neural network. It also refers to the storage and calculation of the weights for the neural network in order from the input layer to the output layer. Forward propagation of the network computes the loss based on the initialized weights.

The error signifies how well our network is performing on a certain dataset so that we can understand the underlying causes of the problems. This can help us prioritize which problem deserves attention and how much. It gives us a direction for handling the errors. The rate of having low error signifies the good performance of the model. The error can be calculated through a loss function.

The function we want to minimize or maximize is called as cost function. When we minimize the function, it refers to the cost function, loss function, or error function. Loss Function is a method to evaluate the algorithm for the model. Loss function values help to find the difference between the actual value and the predicted value. By tuning the algorithm and improve the model, the output of the loss function will tell you if it improves or not. If the loss function results in a higher number then your predictions of the model are off, and if the loss function results in a lower number then the model is pretty good. Hence, the loss function should keep in penalize model effectively while training on a dataset.

The cost function reduces all the various good and bad aspects of a complex system to a scalar value which allows results to be ranked and compared. The loss function (J) can be defines as a function which takes two parameters:

  • Predicted value

  • True value

If the loss is very high, the value will propagate through the network while training and the weights will change a little more than usual. If it’s small then weights will not change that much since the network is already performing well enough.


There are several types of neural networks available or might be in the development stage. They can be classified depending on their: Structure, Data flow, Neurons used and their density, Layers and their depth activation filters, etc.

[1]. Perceptron

Perceptron model is the oldest and simplest model in the early history of neural networks. It is the smallest unit that makes predictions on the linear computation by combining a set of weights with the features. The perceptron is an algorithm for learning a binary classifier called a threshold function, which accepts weighted inputs and puts on the activation function to obtain the output as a final result. It is also known as the Threshold Logic Unit (TLU).

Where, n is the number of inputs to the Perceptron.


  • Perceptron can only implement Logic Gates like AND, OR, or NAND.


  • Perceptron can only learn linearly computed problems such as Boolean AND problem. For non-linear problems such as the Boolean XOR problem, it does not help.

[2]. Feed Forward Neural Networks

Feedforward neural networks follow forward propagation where input passes through artificial nodes and existing through output nodes. The network may be or may not be involving the hidden layers, but input and output layers are mandatory. So they are further divided as a single-layered or multi-layered feed-forward neural network.

The complexity of the network depends on the number of layers involved. It has static weights, activation function, and no backward propagation. The feed-forward neural networks are easy to maintain and equipped with to deal the data with lots of noise.


  • Less complex, easy to design & maintain

  • Speedy [One-way propagation]

  • Highly responsive to noisy data


  • This network may not be helpful for deep learning due to the absence of dense layers and Backpropagation.

[3]. Multilayer Perceptron

Multilayer Perceptron introduces the complexity of neural networks bypassing input data through various layers of neurons. The network is well connected among all the layers of a neuron. It is bi-directional propagation which includes both forward and backward propagation. The network also consists of at least three or more hidden layers. Backward propagation helps to modify to reduce the loss.


  • The network is helpful for deep learning due to the presence of dense fully connected layers and Backpropagation.


  • Comparatively complex to design, maintain and slow

[4]. Convolutional Neural Network (CNN)

The convolutional neural network has a three-dimensional arrangement of neurons. The first layer of the network is called the convolutional layer where each neuron processes the information. Input features are taken batch-wise like the filter. The network processes each part of the images and understands them in parts while computing the operations multiple times to complete the full image processing. Preprocessing involves the conversion of images from RGB to gray-scale followed by threshold. Furthermore, the changes in the pixel value help to mark the edges which help images can be classified into different categories. Propagation is one-directional where CNN contains more than one convolutional layer followed by pooling and bidirectional. Filters are used to extract certain parts of the image. Convolution neural networks illustrate very effective results in image and video recognition, semantic parsing, and paraphrase detection.


  • CNN is used for deep learning with few parameters.

  • Less parameter to learn as compared to fully connected layer.


  • Comparatively complex to design, maintain and slow

[5]. Recurrent Neural Networks (RNN)

Recurrent Neural Network is fed back to the input to help in predicting the outcome of the layer. RNN is designed to save the output of a layer. The first layer is typically a feed-forward neural network followed by a recurrent neural network layer where some information it had in the previous time-step is remembered by a memory function. It stores information required for future use. If the prediction is wrong, the learning rate is employed to make small changes. Hence, making it gradually increase towards making the right prediction during the Backpropagation.