top of page
Music Apps


Author: Parakh Jain

Human beings are able to learn, detect objects and classify them with their eyes. If we want to learn the difference between objects like cars and trucks. First we need to see the images of cars and trucks and will learn from the images. Only then we are able to distinguish between cars and trucks. Similarly, a machine needs data from which it can learn and make able to distinguish and classify images.

Training a Model from Scratch

Learning from scratch means building a network by our own from the first layer itself on a large dataset of approx. millions of images to achieve a good accuracy which requires a lot of time and computational power or resources.

Pre-trained models (e.g. AlexNet,GoogleNet, ResNet, Xception, VGG, Inception, InceptionResNet, MobileNet, MobileNet SSD, etc) has learned to pick out features from images that are used to distinguishing one image (class) from another.

The first few layers detect general features such as edges, corners, circles and blobs of colors. As we go deeper into the network, the layers start to detect more concrete things such as windows, wheels, headlight, tires and full objects.

When we train our model from scratch we should decide the no of different layers and working with them by decreasing or increasing the layers, to decide no of filters, which activation function to use, learning rate etc. for getting the deserved output.

These models are trained on large dataset.

Neural networks are initialized with random weights (usually) that after a series of epochs reach some values that allow us to properly classify our input images.

What would happen if we could initialize those weights to certain values that we know beforehand that are already good to classify a certain dataset?

In this way, we would not need a dataset as big as if we were to train a network from zero (from hundreds of thousands or even millions of images we could go to a few thousands) nor we would need to wait a good number of epochs for the weights to take good values for the classification, they would have it much easier due to their initialization.

How can we use pre-trained model to train our own model?

We can do this task with the techniques of transfer learning and fine-tuning. Let’s explore about these techniques and how to use these them.


Transfer learning is one of those techniques which makes training easier. Transfer Learning is the transferring of knowledge of one model to perform a new task, it can be understood using the teacher – student analogy.

A teacher gives whatever knowledge they had to their students. A teacher has years of experience in a particular topic he/she teaches. They transfer their knowledge to their students and students get the knowledge of whatever they taught by their teacher with the things learned by themselves. This concept of transfer of learning to learn new things on top of old experience is known as Transfer Learning.

Now we will compare this analogy to neural network. Transferring or using the learnt parameters (weights, bias) of a pre-trained network to a new task( task of our own model). Instead of training the other neural network from scratch, we “transfer” the learned features as the name suggests.

AlexNet, MobileNet, Google’s Inception-v3, Microsoft’s ResNet-50, ResNet-101 etc, these neural networks have already been trained on the ImageNet dataset. We only need to enhance them by further training them with their own domain-specific examples.

Transfer learning is usually done when the dataset is too small for training from scratch ,when we don’t want to create our own neural network from scratch and train it on the data we have. As because our dataset is too small, so we use transfer learning here where we have a pre-trained model that we can use for new and similar problems with a small dataset.

For example, We want to predict the cars and we have a pre-trained model that can classify the trucks. So transfer learning is the best option in this scenario. What we will do is to add a new class with the truck in the pre-trained model so that we can predict whether it is a car or a truck. Since, both car and the truck have almost the same features as headlights, door handles, tires, edges, windshields, doors, lights, etc. that it learns to extract from the images and classify them.


  • The first task is to select one of the pre-existing models and import the data from that model. The model selected can be used only when the dataset of an existing model and the new deep learning model are similar with each other. For example, a model previously trained for speech recognition would work horribly if we try to use it to identify objects using it.

  • Second task is to remove the output layer( fully connected layer, classifier layer, etc) as it was programmed for tasks specific to the previous model and then use the entire network as a fixed feature extractor for the new data set. As per above example , the output layer in the pre-existing model tells us whether the input image is car or not. But we need the classification of trucks in our new model. Hence ,we need to remove that layer which classifies cars only. The argument ‘include_top = False’ removes the classification layer.

  • Freeze the layers- Freezing a layer means the weights of that layer won’t be updated. During training, we freeze the feature extraction layer i.e. these layers won’t be trainable. Thus, higher accuracy can be achieved even for smaller datasets. Using ‘layer.trainable=False’, we can freeze the layers.

  • Now we will add classifier layers on the top of the previous layers and train a specific set of layers or the newly added layer after freezing the feature extraction layer of the pre-trained model.



In transfer learning we add a classification layer in our new model and train that layer only after freezing feature extraction layers. This will extract high level features for us to differentiate the classes which are learned by the bottom layers. But what if we want low level features. For example, a pre-trained model is able to classify a door but we want to build such a model which can classify whether the door is closed or semi-opened or opened. In this problem statement training only the classification layer of the pre-trained model will not be able to differentiate the classes of our problem and will not give us the required results. We need to retrain more layers of the model or use features from earlier layers which means we are fine tuning our neural network model.

We might find ourselves in a situation where we consider the removal of some layers from the pre-trained model. Transfer learning is unlikely to work in such an event. This is because removing layers reduces the number of trainable parameters, which can result in overfitting. Furthermore, determining the correct number of layers to remove without overfitting is a cumbersome and time-consuming process.


Fine tuning is like optimization. We optimize the network to achieve the optimal results. May be we can change the number of layers used, no of filters, learning rate and we have many parameters of the model to optimize.

Fine-tuning, in general, means making small adjustments to a process to achieve the desired output or performance.

Tuning Machine Learning Model Is Like Rotating TV Switches and Knobs Until You Get A Clearer Signal.

  • Retrain whole new model with dataset.

  • Can incrementally adapt the pre-trained features to the new data.

  • Requires low learning rate.

Fine-tuning deep learning involves using weights of a previous deep learning algorithm for programming another similar deep learning process. Weights are used to connect each neuron in one layer to every neuron in the next layer in the neural network.

This increases the accuracy of the model as it retrain weights of pre-trained model unlike transfer learning. The fine-tuning process significantly decreases the time required for programming and processing a new deep learning algorithm as it already contains vital information from a pre-existing deep learning algorithm. Some of the prominent fine-tuning frameworks include Keras, ModelZoo, TensorFlow, Torch, and MxNet.

Keras applications:

  • Xception

  • EfficientNet B0 to B7

  • VGG16 and VGG19

  • ResNet and ResNetV2

  • MobileNet and MobileNetV2

  • DenseNet

  • NasNetLarge and NasNetMobile

  • InceptionV3

  • InceptionResNetV2


  • The first step is to download the pre-trained model and remove the top layer(classifier layer) as we have done before with transfer learning.

  • With fine-tuning we are not limited to retraining only the classifier stage (i.e. the fully connected layers), but what we will do is retrain also the feature extraction stage, i.e. the convolutional and pooling layers.

  • It’s important to keep in mind that in a neural network, the first layers detect simpler and more general patterns, and the more we advance in the architecture, the more specific to the dataset and the more complicated the patterns they detect.

  • Therefore, we could allow the last block of convolution and pooling layers to be retrained.


  • If there are similarities between the source and target model, there’s no need to finetune the layers of the pre-trained model. We only need to append a new layer at the end of the network and train our model for the new categories. This is called “deep-layer feature extraction”.

  • When there are considerable differences between the source and target model, or training examples are abundant, we unfreeze several layers in the pretrained model except the starting few layers which determine edges, corners, etc. Then add the new classification layer and finetune the unfrozen layers with the new examples. This is called “mid-layer feature extraction”.

  • When there are significant differences between the source and target model, we unfreeze and retrain the entire neural network called “full model fine-tuning”, this type of transfer learning also requires a lot of training examples.

  • When we are not provided with enough data: Machine learning models require a lot of data which is not an easy task to collect such a huge amount of data. If we do not have a sufficient amount of data it will give us the results with less accuracy. But using a pre-trained model we can achieve high accuracy for the same dataset. For example, the ImageNet dataset contains over 1 million images. We prefer such a big dataset to use in our model to achieve high accuracy which is a task in itself to collect such a huge dataset.

  • When we don’t have sufficient computational power: assuming that we had that kind of dataset but not the computational resources like RAM,CPU,GPU or TPU etc in order to train on such a large dataset. In this case using transfer learning or fine tuning is a better option.


Hope you have enjoyed the blog. Feel free to provide your feedback and ask your queries in the comment box. If you require code of any section please do provide the comment.

2,602 views0 comments


bottom of page