top of page

Optical Character Recognition (OCR)

Updated: Aug 5, 2021


An Introduction to OCR:

Research into optical character recognition is currently taking place since it attempts to develop a computer system with the ability to extract and process text from images automatically. Nowadays, document digitization is in high demand using optical character recognition.

The OCR can detect printed or handwritten text, which is stored on disks for processing by our computers in the future. The technology lets data be derived from any image, irrespective of its format, or how it is embedded in another image. It converts the text from its digital image format and converts it to a machine-readable and editable text format. OCR basically work through some sub-processes which include:

  • Image Preprocessing

  • Image classification and Text localization

  • Character Segmentation

  • Feature Extraction

  • Post Processing

Challenges that can arise during the making of an OCR:

  • Image having a complex background or if it’s distorted:

OCR can face some problems in detecting text because the complexity of the image makes it harder to segregate text from the rest of the non-text part.

  • Uneven Lighting can also be a challenge for OCR as it makes it harder to detect the text from the image with accuracy.

  • Variation in Fonts and Font sizes can result in degraded segmentation of the text. The OCR can be confused because of the variation in the size of the text.

  • Multilanguage text environment is also a challenge for OCR.

  • Rotation/Skewness is a challenge for OCR because the point of view or skewness is not fixed in-camera images. It is included in preprocessing steps for skewing the image.

Some of the best OCR’s We can use for our Projects and models:

  1. Google Vision API: Google Cloud Vision OCR is a Google service that allows you to get text out of a digital image. One of the best OCRs. But it is also fairly priced with billing available via's cloud. Once that is enabled, you can use the vision APIs for your OCR.

  2. Microsoft Computer Vision API- Cognitive Services: Several advanced algorithms are included in Microsoft's Computer Vision API for image recognition. In addition to extracting text from an image, it can detect offensive content in an image and can be used to detect faces. A subscription for Microsoft Azure is also included.

  3. Tesseract OCR: The Tesseract OCR can be easily downloaded and installed on your computer in order to use it in your program for text extraction from images. Tesseract offers the advantage of being compatible with a wide variety of programming languages and is easily accessible. It does not come with a built-in GUI.


You can easily download a 64bit version for smooth OCR functioning from the link here.

After a successful download install it in your system in your desired location, but remember to save its path for further use.

These were some OCRs that I will recommend you to use for smooth OCR functioning. I am using Tesseract OCR for this tutorial.

We now know how to install OCR. Next, let's try to extract text from a picture. In order to accomplish that, we will use the Python OpenCV library which will aid us in reading images from a provided directory. Before we get into the coding part of OpenCV, let's have a look at a brief introduction.

Introduction to Opencv:

OpenCV is an Open-source software library that integrates Computer Vision features into our programming. It provides functions for real-time computer vision projects. It was originally written in C++ but can easily be adapted to other languages like Python, MATLAB, and Java.


Using Pip Just open your command prompt and type pip install OpenCV-python and it will install all the packages for Opencv.

If you are using Anaconda- You can simply type the pip command in your Anaconda prompt also, or you can use Conda install -c conda-forge OpenCV

OpenCV has amazing functions for image recognition, let’s take a look at some functions which are important for you to know before using OpenCv for your OCR.

cv2.imread() – For reading the image from the path you provide.

Image= cv2.imread(‘Image21.jpg’)

cv2.imshow()- For displaying the image from the particular path you provide.


cv2.cvtcolor()- You can use this function for changing the color schemes of your image like

cv2.cvtcolor(Image, COLOR_BGR2GRAY)

cv2.resize()- You can use this function for resizing your images, here you can define the exact dimensions you want your image in.

resized_Im= cv2.resize(Image,(225,225))

Edge Detection techniques- You can use the canny edge detection technique for an outline of the image.

EdgeDet_Image= cv2.canny(Image, 100,200)

Thresholding with OpenCV- For every Image, some threshold value is defined. If the pixel value is smaller than the threshold it will provide a 0, and it will provide a maximum value for the higher pixel value.

cv2.threshold(Image, 255,255, THRESH_BINARY)

Gaussian Blur- A Gaussian kernel is used to reduce the noise from the image.

blur = cv2.GaussianBlur(img,(5,5),0)

A Simple Text Extractor :

Let’s try to extract text from images with a simple code snippet:

First Let me show you the image I am using:

Code Snippet for a simple text extractor:

import cv2
import pytesseract
pytesseract.pytesseract.tesseract_cmd=r'C:\Program Files\Tesseract-OCR\tesseract.exe'
image= cv2.imread('C:/Users/vansh/Downloads/TS1.jpg')
text= pytesseract.image_to_string(image)

We got the following results:

The above code provides a brief idea of the working of an OCR, so let's move forward and make a cool OCR.

OCR with Tesseract and OpenCV:

Making an OCR involves few sub-processes, so in this tutorial, we will proceed through all of the sub-processes one by one.

The following is the Invoice that is used throughout the tutorial.