How to start with Neural Networks

Riya Shivhare
Aug 1, 2021
13 min read

Updated: Aug 5, 2021

A Neural network is the mathematical modeling that is inspired by the human brain to store information. A neural network is a learning algorithm that mimics the way of the human brain operates. As the human brain uses the system of neurons to recognize the underlying relationships in a set of data. Neural networks work like computing systems with interconnected nodes which helps neurons to recognize hidden patterns and correlations in raw data, cluster, and classify them. It continuously learns and improves over time during the training of the model.

Neural networks are used in machine learning by using layers for analyzing and learning data. Neural network algorithm generates the close and best possible results and can adapt according to changing input. The network is widely gaining popularity in the field of Artificial Intelligence. The idea behind a neural network is to simulate lots of interconnected dense layers of brain cells inside a computer, so it makes it learn things, recognize patterns, solve complexity and make decisions in a way just like a living being.

Though it does have its limitations as it is only an intelligent machine originally, that cannot replace a living brain of a human. Neural networks are software simulations that are working in a traditional pattern with a series of logical programming working in parallel.

LAYERS

Simple neural network signifies wide variety of complex functions and understanding the purpose of the each layer of neurons in the network. Network mainly consist of three layers:

Input Layer: This layer is the first and foremost input of your text, image or proper data for neural network that is to be passed on.
Hidden Layer
Output Layer: This layer simply provides the result of the neural network of the given input.

Glimpse of the python code:

Each layer is responsible for the learning of the input data to provide accurate results. The abstract representations might be difficult for humans to make sense but it works for the algorithm to classify data better. Here, we are going to use basic python programming and Keras High-Level Deep Learning API as it is easy to use and an open-source library of evaluation for deep learning of the models.

Keras is a powerful and easy-to-use free open-source Python library for developing and evaluating deep learning models. Deep neural learning is a part of machine learning that works similarly to the human brain. It is a form of neural learning, with functions that operate in a highly volatile decision-making process. Deep learning occurs when decisions are made on unstructured data without supervision. Object recognition, speech recognition, and language translation are some of the examples that involved deep neural learning.

Step1: Import Keras dataset

import keras
from   keras.datasets import mnist
 
(train_images,   train_labels),(test_images, test_labels) = mnist.load_data()

Here, first we need to import Keras and then the mnist dataset to work upon. We can also load our desire dataset. MNIST contains train dataset of about 60,000 images and test dataset of about 10,000 with 28*28 pixel grayscale digits between 0 to 9. Train data used to train the model on the input data and test data used to test the accuracy of our system on the given input.

All the images or data should be in a proper pixel orange throughout the process. We need to preprocess all the images before loading the dataset.

Step2: Introducing Neural layers

from keras import models
from keras import layers
layers  = models. Sequential ()
layers. Dense (512, activation = 'relu')
layers .Dense (256, activation = 'relu'))
layers. Dense(10, activation = 'softmax'))

Our network here uses 3 dense layers which are fully connected, the second layer is called hidden layer with 256 neurons with ReLU activation function. Each score is the probability that the current image is belongs to one of the classes.

Step3: Calculation of Cost function

layers.compile(optimizer = 'rmsprop', 
 loss= 'categorical_crossentropy',
 metrics = ['accuracy'])

Here, we are calculating the loss and accuracy of our model.

Step4: Training

models .fit(train_images, train_labels, epochs = 8, batch_size = 64)

In keras, we use fit() method for training and epochs indicates the number of passes of the training dataset.

Batch size used to refer the number of training examples utilized in one iteration.

Step5: Evaluation of the network

test_loss, test_acc = models.evaluate (train_images, train_labels)
print ('Test accuracy', test_acc)

Here, we evaluate the model by testing the accuracy of the network.

APPLICATIONS OF NEURAL NETWORKS

Neural networks are ideally come in handy to solve complex problems or complexity in real-life situations. By learning and modeling the relationships and introducing multi hidden layers that help to solve complex and non-linear, generalizations and inferences, making predictions and model highly volatile data and also help in the prediction of rare events. So, the neural network provides good and accurate results when it comes to making predictions based on data. Neural Networks are an example of Machine Learning. It involves in various areas like:

Scam detection
Optimization of logistics data networks
Natural language processing
Medical diagnosis
Trading & marketing
Financial predictions for stock prices
Robotic control systems
Process and quality control
Chemical compound identification
Computer vision to interpret raw photos and videos

WEIGHTS OF NEURON LAYERS

The layers involved in extracting patterns from the data have some values, called weights which carry the unique feature that helps us to approach towards the closest result. When the network is prepared then it gets initialized with some set of weights on a given training set to be trained on. The value of each weight on the layer is essential as it provides the information regarding that input. So, whenever the network is prepared to be trained on the training set, gets initialized with a set of weights.

The strength of an input layer connection can be expressed by a real number. The input via interconnects has a weight attached to it which would receive by processing elements. The input layer has a weight between 0 and 1, which will keep on updating while processing. The result value after a processing element can be expressed by the excitation level that causes interconnects to be either excitatory output (ON) or inhibitory output (OFF).

Computation of Weights:

A neuron calculates the weighted sum of the inputs.

Let the inputs as:

and weights as:

A bias (constant) is added to the weighted sum

Finally, the computed value is fed into the activation function, which resulted as an output.

Characteristics of Weights:

The steepness of the activation function gets increased by weights.
The rapidity of getting trigger by activation function is relying on weights.
The relationship between a feature and a target value can be extracted from weights.
Weights is also responsible for changing the orientation that separates the classes of data points
Weights provide the importance of a feature in computing the target value.
Weights are the co-efficient of the equation by which we try to resolve.
Negative weights reduce the value of an output.

The model weights are always in small positive values and the sum of all weights is always equal to one. It allows the weights to indicate the percentage of the expected performance from each model. To find the exact value of weights on each input member, the easiest approach would be to grid search values between 0 and 1 for an individual. An optimization procedure such as gradient descent optimization or linear solver can be used to estimate the weights.

BIAS IN WEIGHT

Bias is just a constant that has to be added to the product of weights and inputs. It is utilized to offset the output. The bias becomes mandatory to shift the value of the output of an activation function towards the negative or positive side. Therefore, Bias is a constant which helps the model in a way that it can fit best for the given data.

Characteristics of bias:

The addition of bias in the product reduces the variance.
Bias is like an intercept added to the linear equation.
Bias is used to delaying the triggering of the activation function.
Bias is responsible for shifting the curve towards the right.
The bias also introduces better generalization and flexibility to the neural networks.
The bias is essentially the negative of the threshold which is why the value of bias controls when to activate the activation function.
The bias is just an additional parameter in the network that is used to adjust the output along with the weighted sum of inputs to the neuron.

ACTIVATION FUNCTION

The activation functions have a vital role in the designing of the whole neural network. An activation function in a neural network defines how the weighted sum of the input is transformed into an output from nodes in a layer of the network. The activation function also refers to a “transfer function” because the choice of an activation function has a grand impact on the capability and the performance of the neural network.

The choice of an activation function in the hidden layer will control how well the network model learns the training dataset. The choice of activation function in the output layer will define the type of predictions the model can make. Different activation functions are used in different parts of the neural network. All the hidden layers typically use the same activation function. The output layer will typically use a different activation function from the hidden layers and is dependent upon the type of prediction required by the model.

The activation function has the authority to decides whether a neuron should be activated or not by computing the weighted sum and further adding bias with it. The purpose of the activation function is to introduce non-linearity into the output of a neuron. The neural network has neurons that work in correspondence of weight, bias, and their respective activation function. Activation functions make the back-propagation possible since the gradients are supplied along with the error to update the weights and biases.

Non-linearity of activation function:

A neural network without an activation function is essentially just a linear regression model. The activation function does the non-linear transformation to the input making it capable to learn and perform more complex tasks.

LOSS FUNCTION, ERROR, BACK AND FORWARD PROPAGATION

Backpropagation tries to update the weights and biases based on the gradients it has computed against the loss. Backpropagation is the essence of neural network training. It is the method of fine-tuning the weights of a neural network based on the error rate obtained in the previous epoch. In a neural network, we would update the weights and biases of the neurons based on the error at the output. This process is known as back-propagation.

Forward propagation refers to the movement in only one direction, from input to output, in a neural network. It also refers to the storage and calculation of the weights for the neural network in order from the input layer to the output layer. Forward propagation of the network computes the loss based on the initialized weights.

The error signifies how well our network is performing on a certain dataset so that we can understand the underlying causes of the problems. This can help us prioritize which problem deserves attention and how much. It gives us a direction for handling the errors. The rate of having low error signifies the good performance of the model. The error can be calculated through a loss function.

The function we want to minimize or maximize is called as cost function. When we minimize the function, it refers to the cost function, loss function, or error function. Loss Function is a method to evaluate the algorithm for the model. Loss function values help to find the difference between the actual value and the predicted value. By tuning the algorithm and improve the model, the output of the loss function will tell you if it improves or not. If the loss function results in a higher number then your predictions of the model are off, and if the loss function results in a lower number then the model is pretty good. Hence, the loss function should keep in penalize model effectively while training on a dataset.

The cost function reduces all the various good and bad aspects of a complex system to a scalar value which allows results to be ranked and compared. The loss function (J) can be defines as a function which takes two parameters:

Predicted value
True value

If the loss is very high, the value will propagate through the network while training and the weights will change a little more than usual. If it’s small then weights will not change that much since the network is already performing well enough.

TYPES OF NEURAL NETWORKS

There are several types of neural networks available or might be in the development stage. They can be classified depending on their: Structure, Data flow, Neurons used and their density, Layers and their depth activation filters, etc.

[1]. Perceptron

Perceptron model is the oldest and simplest model in the early history of neural networks. It is the smallest unit that makes predictions on the linear computation by combining a set of weights with the features. The perceptron is an algorithm for learning a binary classifier called a threshold function, which accepts weighted inputs and puts on the activation function to obtain the output as a final result. It is also known as the Threshold Logic Unit (TLU).

Where, n is the number of inputs to the Perceptron.

Advantages:

Perceptron can only implement Logic Gates like AND, OR, or NAND.

Disadvantages:

Perceptron can only learn linearly computed problems such as Boolean AND problem. For non-linear problems such as the Boolean XOR problem, it does not help.

[2]. Feed Forward Neural Networks

Feedforward neural networks follow forward propagation where input passes through artificial nodes and existing through output nodes. The network may be or may not be involving the hidden layers, but input and output layers are mandatory. So they are further divided as a single-layered or multi-layered feed-forward neural network.

The complexity of the network depends on the number of layers involved. It has static weights, activation function, and no backward propagation. The feed-forward neural networks are easy to maintain and equipped with to deal the data with lots of noise.

Advantages:

Less complex, easy to design & maintain
Speedy [One-way propagation]
Highly responsive to noisy data

Disadvantages:

This network may not be helpful for deep learning due to the absence of dense layers and Backpropagation.

[3]. Multilayer Perceptron

Multilayer Perceptron introduces the complexity of neural networks bypassing input data through various layers of neurons. The network is well connected among all the layers of a neuron. It is bi-directional propagation which includes both forward and backward propagation. The network also consists of at least three or more hidden layers. Backward propagation helps to modify to reduce the loss.

Advantages:

The network is helpful for deep learning due to the presence of dense fully connected layers and Backpropagation.

Disadvantages:

Comparatively complex to design, maintain and slow

[4]. Convolutional Neural Network (CNN)

The convolutional neural network has a three-dimensional arrangement of neurons. The first layer of the network is called the convolutional layer where each neuron processes the information. Input features are taken batch-wise like the filter. The network processes each part of the images and understands them in parts while computing the operations multiple times to complete the full image processing. Preprocessing involves the conversion of images from RGB to gray-scale followed by threshold. Furthermore, the changes in the pixel value help to mark the edges which help images can be classified into different categories. Propagation is one-directional where CNN contains more than one convolutional layer followed by pooling and bidirectional. Filters are used to extract certain parts of the image. Convolution neural networks illustrate very effective results in image and video recognition, semantic parsing, and paraphrase detection.

Advantages:

CNN is used for deep learning with few parameters.
Less parameter to learn as compared to fully connected layer.

Disadvantages:

Comparatively complex to design, maintain and slow

[5]. Recurrent Neural Networks (RNN)

Recurrent Neural Network is fed back to the input to help in predicting the outcome of the layer. RNN is designed to save the output of a layer. The first layer is typically a feed-forward neural network followed by a recurrent neural network layer where some information it had in the previous time-step is remembered by a memory function. It stores information required for future use. If the prediction is wrong, the learning rate is employed to make small changes. Hence, making it gradually increase towards making the right prediction during the Backpropagation.

Advantages:

Model sequential data where each sample can be assumed to be dependent.
Used with convolution layers to extend the pixel effectiveness.

Disadvantages:

Gradient vanishing and exploding problems
Training of RNN model could be a difficult task
Difficult to process long sequential data using ReLU as an activation function.

[6]. Modular Neural Network

A modular neural network has various numbers of networks that function independently and perform separate tasks. The other networks do not interact with the signal during the computation process. The network works its own separately and independently to achieve the output. A large and complex computational process is done faster by breaking it into independent components. The computation process increases rapidly as the network is not at all interacting or even connected.

Advantages:

Efficient and robust
Independent training

Disadvantages:

Moving target Problems

ARTIFICIAL NEURAL NETWORK

An artificial neural network is the part of a computing system that is used to simulate the data analysis and processes information. It is the foundation of Artificial Intelligence and optimizes problems that would prove by impossibility as a statistical standard. ANNs have the ability of self-learning capabilities that enables them to produce better results as data becomes more available. ANNs have a feature as they modify themselves as they learn from training data and subsequent runs provide more information.

Artificial neural networks are built like the human brain, with neuron nodes interconnected like a web. An ANN has hundreds or thousands of artificial neurons called processing units, which are interconnected by nodes. These processing units are made up of input and output units. The input units receive various forms and structures of information based on an internal weighting system and the neural network attempts to learn about the information presented to produce one output report. and also use a set of learning rules called Backpropagation, to perfect their output results.

An ANN initially goes through a training phase where it learns to recognize patterns in data, whether visually, aurally, or textually. During this supervised phase, the network compares its actual output produced with what it was meant to produce the desired output. The difference between both outcomes is adjusted using Backpropagation. This means that the network works backward, going from the output unit to the input units to adjust the weight of its connections between the units until the difference between the actual and desired outcome produces the lowest possible error. During the training and supervisory stage, the ANN is taught what to look for and what its output should be, using monologues question types with binary numbers.

APPLICATIONS FOR ARTIFICIAL NEURAL NETWORKS

Artificial neural networks are paving the way for life-changing applications to be developed for use in all sectors of the economy. Artificial intelligence platforms that are built on ANNs are disrupting the traditional ways of doing things. From translating web pages into other languages to having a virtual assistant order groceries online to conversing with chatbots to solve problems, AI platforms are simplifying transactions and making services accessible to all at negligible costs.

Image Recognition is one of the areas where neural networks were successfully implemented with good results. There are other areas where this technology has expanded, including:

Chatbots
Natural language processing
Stock market prediction
Drug discovery and development

Artificial neural networks are widely used today because of their strict process of operating rules and patterns of large amounts of data. If the dataset involve is too large for a human to make sense in a reasonable time, this process then likely conducted for automation through artificial neural networks.

CONCLUSION

Neural networks have gained popularity in past years and also gained widespread adoption in various fields such as business & financial operations, trading, forecasting and marketing research solutions, fraud detection, and risk assessment. A neural network evaluates the data based on a pre-trained dataset and performs accordingly to greet with better and accurate results. The network can distinguish subtle nonlinear interdependencies and patterns method.

The Neural networks are capable of learning complex and tedious relationships from datasets of training examples. This property allows them to moderate and be well suited to detect pattern recognition problems involving the detection of complexity in bulky datasets. The neural network has been working to overcome the domain of medical abnormalities from physiological measures. The network has been applied to problems such as detection of cardiac abnormalities and breast cancer, has been proving capable of diagnostic abilities of expert physicians.

The neural network has been in handy to handle and evaluation of big datasets for trading and data analysis with many accurate outputs. The network has been useful for analogy or training the new algorithms, it is well trained with data that gives you accurate results and success of a neural network. There are certain fields where neural network never loses its face.

Character Recognition
Image Recognition
Trading and Stock Pricing
In medicine applications

Hope you have enjoyed the blog. Feel free to provide your feedback and ask your queries in the comment box.

6 Comments

Sidney De Queiroz Pedrosa

Jul 31

This was such a clear and insightful introduction to neural networks! I appreciate how you broke down complex concepts into digestible sections—especially the part about perceptrons and the basics of forward propagation. It's often hard to find beginner-friendly resources that don’t assume too much prior knowledge, but this post strikes a great balance. Looking forward to reading more content like this. Keep up the great work! Sidney De Queiroz Pedrosa

Veronica Dantas

This is an excellent introduction to neural networks! The step-by-step approach and clear explanations make complex concepts much more approachable, especially for beginners. I appreciate how you combined theory with practical tips—very insightful. Looking forward to more content like this! Veronica Dantas

Beatriz Barata

This was a fantastic introduction to neural networks—clear, concise, and beginner-friendly. I really appreciated how the blog breaks down complex concepts into digestible steps, especially the explanation of perceptrons and the role of activation functions. The visual examples and step-by-step guidance really help demystify the learning curve for those new to AI. Looking forward to more posts like this—great job! Beatriz Barata

Luiz Gustavo Mori

Great article! Your explanation of neural networks is clear, concise, and incredibly helpful for beginners. I especially appreciated how you broke down complex concepts into digestible steps. Looking forward to more posts like this—keep up the great work! Luiz Antonio Duarte Ferreira

Daniel Dantas

What a fantastic introduction to neural networks! 🌟 I really appreciated how you broke down the foundational concepts—input, hidden, and output layers—and illustrated the role of activation functions like ReLU and softmax. The walkthrough using Keras with MNIST made the process feel approachable and actionable. You’ve clearly taken the time to explain and demystify complex ideas, which is a big help for beginners. I also loved seeing the discussion about bias and backpropagation—this post strikes that perfect balance of clarity and depth. Thanks for such a practical, well‑structured guide — it’s exactly the kind of resource someone new to deep learning needs! Daniel Dantas