top of page

Bounding Boxes for Character Recognition

Updated: Aug 7, 2021

A bounding box is an imaginary rectangle that are used to outline the object in a box as per as machine learning project requirement. They are the main outcomes of object detection model. The imaginary rectangle frame that surrounds an object in an image. Bounding boxes specifies position of object, its class as well as confidence that that tell us the chance of the object present at that location. Blue rectangle is bounding box that describe where our object (ironman) is located in image

Bounding box contain two pairs of co-ordinate axis component i.e. one for upper left corner co-ordinate and other for lower right corner co-ordinate. conventions followed in representing bounding box:

1. Creating box with respect to top left and bottom right point of coordinates

2. Creating the box with respect to its center, width, height

Parameters used in this bounding box are :

  • Class: represents the object inside the box. Eg. jimin in this case

  • (x1, y1): x and y coordinate of the top left corner of the rectangle.

  • (x2, y2): the x and y coordinate of the bottom right corner of the rectangle.

3. (xc, yc): x and y coordinate of the center of the bounding box. xc = ( x1 + x2 ) / 2 and yc = ( y1 + y2 ) /2

4. Width: width of the bounding box. width = ( x2 — x1)

5. Height: the height of the bounding box. height = (y2 — y1)

6. Confidence: Indicates probability of object present in that box. For example a confidence of 0.7 would indicate that there is a 70% chance that object actually exists in that box.

Model should predict the bounding box closed as ground truth as possible hence we have both the grand truth label and the predictions in bounding box format. Bounding boxes are one of the most popular image annotation techniques in deep learning. this method reduce costs and increase efficiency of annotation.

functions to perform conversion:

1. Conversion of upper-left and lower-right coordinates to center, width, height.

def corner_to_center(boxes):
 x1, y1, x2, y2 = boxes[:, 0], boxes[:, 1], boxes[:, 2], boxes[:, 3]
 cx = (x1 + x2) / 2
 cy = (y1 + y2) / 2
 w = x2 - x1
 h = y2 - y1
 boxes = np.stack((cx, cy, w, h), axis=-1)
 return boxes

2. Conversion of center, width and height to upper-left and lower-right coordinates.

def center_to_corner(boxes):
 cx, cy, w, h = boxes[:, 0], boxes[:, 1], boxes[:, 2], boxes[:, 3]
 x1 = cx  (0.5 * w)
 y1 = cy  (0.5 * h)
 x2 = cx + (0.5 * w)
 y2 = cy + (0.5 * h)
 boxes = np.stack((x1, y1, x2, y2), axis=-1)
 return boxes

Code for bounding box

%matplotlib inline

"""Sets the backend of matplotlib to the 'inline' backend so that the output of plotting commands is displayed inline within frontends like the Jupyter notebook, directly below the code cell that produced it. The resulting plots will then also be stored in the notebook document. """.

from google.colab import drive

above code is to take images saved in your google drive Once the Drive is mounted, you'll get the message “Mounted at /content/gdrive” , and you'll be able to browse through the contents of your Drive from the file-explorer pane. Now your Google Drive as if it was a folder in your Colab environment.

%matplotlib inline
!pip install mxnet #to install mxnet in google coloab jupyter notebook
!pip install d2l #to install d21 in google coloab jupyter notebook
from mxnet import image, np, npx
from d2l import mxnet as d2l

img = image.imread('/content/drive/MyDrive/Data/FaceDetection/images/trainimg/1Vbts.jpg').asnumpy()

Defining bounding box to our object of interest

Suppose if we want identify two person Jisoo and V named as obj_1 and obj_2 respectively then we will define the bounding boxes of obj_1 and obj_2 in the image based on the coordinate information.

Let the coordinates of obj_1 are:

x1, y1, x2, y2

And coordinates of obj_2 are:

x1, y1, x2, y2

put the values of above coordinates

obj1 _bbox, obj2 _bbox =[30.0, 0.0, 350.0, 330.0], [440.0, 0.0, 690.0, 280.0]

put the value of x1,y1,x2,y2 for the object you choose to draw bounding boxes in # that coordinates

We will define a function name bbox_to_rectangle. It represents the bounding box in the bounding box format of the matplotlib package.

def bbox_to_rectangle(bbox, color):
 """Convert bounding box to matplotlib format."""
 return d2l.plt.Rectangle(xy=(bbox[0], bbox[1]), width=bbox[2] - bbox[0],
                             height=bbox[3] - bbox[1], fill=False,
                             edgecolor=color, linewidth=2)
fig = d2l.plt.imshow(img)
fig.axes.add_patch(bbox_to_rectangle(obj1_bbox, 'yellow'))
fig.axes.add_patch(bbox_to_rectangle(obj2_bbox, 'red'));

""" axes.add_patch : in axes module of matplotlib library is used to add a Patch to the axes’ patches; return the patch."""

From below given image we can see the rectangle box on object 1 as well as object_2 . here obj_1 is jisoo and obj_2 is V

Bounding Boxes in Object Detection

Object detection has two components: image classification and object localization. In other words, to detect an object in an ima