top of page


Author: Parakh Jain

Human beings are able to learn, detect objects and classify them with their eyes. If we want to learn the difference between objects like cars and trucks. First we need to see the images of cars and trucks and will learn from the images. Only then we are able to distinguish between cars and trucks. Similarly, a machine needs data from which it can learn and make able to distinguish and classify images.

Training a Model from Scratch

Learning from scratch means building a network by our own from the first layer itself on a large dataset of approx. millions of images to achieve a good accuracy which requires a lot of time and computational power or resources.

Pre-trained models (e.g. AlexNet,GoogleNet, ResNet, Xception, VGG, Inception, InceptionResNet, MobileNet, MobileNet SSD, etc) has learned to pick out features from images that are used to distinguishing one image (class) from another.

The first few layers detect general features such as edges, corners, circles and blobs of colors. As we go deeper into the network, the layers start to detect more concrete things such as windows, wheels, headlight, tires and full objects.

When we train our model from scratch we should decide the no of different layers and working with them by decreasing or increasing the layers, to decide no of filters, which activation function to use, learning rate etc. for getting the deserved output.

These models are trained on large dataset.

Neural networks are initialized with random weights (usually) that after a series of epochs reach some values that allow us to properly classify our input images.

What would happen if we could initialize those weights to certain values that we know beforehand that are already good to classify a certain dataset?

In this way, we would not need a dataset as big as if we were to train a network from zero (from hundreds of thousands or even millions of images we could go to a few thousands) nor we would need to wait a good number of epochs for the weights to take good values for the classification, they would have it much easier due to their initialization.

How can we use pre-trained model to train our own model?

We can do this task with the techniques of transfer learning and fine-tuning. Let’s explore about these techniques and how to use these them.


Transfer learning is one of those techniques which makes training easier. Transfer Learning is the transferring of knowledge of one model to perform a new task, it can be understood using the teacher – student analogy.

A teacher gives whatever knowledge they had to their students. A teacher has years of experience in a particular topic he/she teaches. They transfer their knowledge to their students and students get the knowledge of whatever they taught by their teacher with the things learned by themselves. This concept of transfer of learning to learn new things on top of old experience is known as Transfer Learning.

Now we will compare this analogy to neural network. Transferring or using the learnt parameters (weights, bias) of a pre-trained network to a new task( task of our own model). Instead of training the other neural network from scratch, we “transfer” the learned features as the name suggests.

AlexNet, MobileNet, Google’s Inception-v3, Microsoft’s ResNet-50, ResNet-101 etc, these neural networks have already been trained on the ImageNet dataset. We only need to enhance them by further training them with their own domain-specific examples.

Transfer learning is usually done when the dataset is too small for training from scratch ,when we don’t want to create our own neural network from scratch and train it on the data we have. As because our dataset is too small, so we use transfer learning here where we have a pre-trained model that we can use for new and similar problems with a small dataset.

For example, We want to predict the cars and we have a pre-trained model that can classify the trucks. So transfer learning is the best option in this scenario. What we will do is to add a new class with the truck in the pre-trained model so that we can predict whether it is a car or a truck. Since, both car and the truck have almost the same features as headlights, door handles, tires, edges, windshields, doors, lights, etc. that it learns to extract from the images and classify them.


  • The first task is to select one of the pre-existing models and import the data from that model. The model selected can be used only when the dataset of an existing model and the new deep learning model are similar with each other. For example, a model previously trained for speech recognition would work horribly if we try to use it to identify objects using it.

  • Second task is to remove the output layer( fully connected layer, classifier layer, etc) as it was programmed for tasks specific to the previous model and then use the entire network as a fixed feature extractor for the new data set. As per above example , the output layer in the pre-existing model tells us whether the input image is car or not. But we need the classification of trucks in our new model. Hence ,we need to remove that layer which classifies cars only. The argument ‘include_top = False’ removes the classification layer.

  • Freeze the layers- Freezing a layer means the weights of that layer won’t be updated. During training, we freeze the feature extraction layer i.e. these layers won’t be trainable. Thus, higher accuracy can be achieved even for smaller datasets. Using ‘layer.trainable=False’, we can freeze the layers.

  • Now we will add classifier layers on the top of the previous layers and train a specific set of layers or the newly added layer after freezing the feature extraction layer of the pre-trained model.



In transfer learning we add a classification layer in our new model and train that layer only after freezing feature extraction layers. This will extract high level features for us to differentiate the classes which are learned by the bottom layers. But what if we want low level features. For example, a pre-trained model is able to classify a door but we want to build such a model which can classify whether the door is closed or semi-opened or opened. In this problem statement training only the classification layer of the pre-trained model will not be able to differentiate the classes of our problem and will not give us the required results. We need to retrain more layers of the model or use features from earlier layers which means we are fine tuning our neural network model.

We might find ourselves in a situation where we consider the removal of some layers from the pre-trained model. Transfer learning is unlikely to work in such an event. This is because removing layers reduces the number of trainable parameters, which can result in overfitting. Furthermore, determining the correct number of layers to remove without overfitting is a cumbersome and time-consuming process.


Fine tuning is like optimization. We optimize the network to achieve the optimal results. May be we can change the number of layers used, no of filters, learning rate and we have many parameters of the model to optimize.

Fine-tuning, in general, means making small adjustments to a process to achieve the desired output or performance.

Tuning Machine Learning Model Is Like Rotating TV Switches and Knobs Until You Get A Clearer Signal.