Exploring projects that bridge legacy systems with modern AI

From Concept,
to Code

Object Detection Model

YOLOv - You Only Look Once

YOLO, which stands for You Only Look Once, is a family of deep learning models designed for real-time object detection. It is a fast, accurate, and easy-to-use model for various tasks, including object detection, image classification, and image segmentation. The core idea behind YOLO is to predict all the objects in an image in a single pass through the neural network, which is why it's so fast.

How Object Detection Works in YOLOv8

The object detection process in YOLOv8 can be broken down into a few key steps:

1. Input Image

The process begins when an image is fed into the YOLOv8 model. The model resizes this image to a standard input size for efficient processing.

2. Grid System

The resized image is then divided into a grid. For each grid cell, the model predicts a few things:

  • A bounding box that encloses a detected object.

  • A class probability that tells you what type of object is inside that box (e.g., person, car, dog).

  • A confidence score that represents how certain the model is about its prediction.

3. Bounding Boxes & Confidence Scores

Each grid cell can predict multiple bounding boxes. Each box comes with a confidence score, which is a number between 0 and 1. This score tells you how likely the model thinks the box contains an object and how accurate the box's position is. A higher score means a more confident prediction.

4. Non-Maximum Suppression (NMS)

The model often generates multiple overlapping bounding boxes for the same object. Non-Maximum Suppression (NMS) is an algorithm that solves this problem. It works like this:

  • It looks at all the bounding boxes for a single object.

  • It discards any boxes with a confidence score below a certain confidence threshold.

  • From the remaining boxes, it selects the one with the highest confidence score.

  • It then removes any other boxes that significantly overlap with the selected box. The degree of overlap is measured using a metric called IoU (Intersection over Union).

Key Parameters

Confidence Threshold

The confidence threshold is a setting you can adjust. It acts as a filter for the model's predictions. Any detected object with a confidence score below this threshold is immediately discarded.

  • A low threshold (e.g., 0.2) will show more predictions, including some that might be incorrect.

  • A high threshold (e.g., 0.9) will only show the most certain predictions, but you might miss some real objects.

Intersection over Union (IoU)

IoU is a metric used to measure the overlap between two bounding boxes. It's calculated by dividing the area where the two boxes overlap by the total area they cover together.

  • An IoU of 0 means there is no overlap at all.

  • An IoU of 1 means the boxes are a perfect match.

  • In NMS, if a bounding box has an IoU with the highest-confidence box that is above a certain IoU threshold, it gets discarded. This prevents the model from showing multiple boxes for the same object.

1. YOLOv8 Architecture Overview

The above diagram shows the flow of an image through the YOLOv8 model:

  • Input Image: The starting point, where an image is fed into the system.

  • Backbone (feature extraction): The model's "brain" that analyzes the image and identifies key features like edges, shapes, and textures.

  • Neck (feature aggregation): A part of the model that combines features from different parts of the network to get a more complete understanding of the objects.

  • Head (detection): This is the final stage where the model predicts bounding boxes, classifies the objects inside, and gives a confidence score for each prediction.

2. Confidence Thresholding

This section visually explains how the confidence threshold works as a filter.

  • The diagram shows several bounding boxes with different confidence scores (e.g., 0.95, 0.78, 0.45).

  • A confidence threshold is set (in this example, 0.5).

  • Bounding boxes with a confidence score above the threshold (0.95 and 0.78) are kept, while those below it (0.45 and 0.21) are discarded. This ensures you only see the most certain predictions.

3. Intersection over Union (IoU)

This part of the diagram explains the IoU metric, which measures the overlap between two bounding boxes.

  • It shows a "Predicted Bounding Box" and a "Ground Truth Bounding Box" (the actual location of the object).

  • The formula IoU = Area of Intersection / Area of Union is displayed, which tells you how much the two boxes overlap. A higher IoU means the predicted box is more accurate.

4. Non-Maximum Suppression (NMS)

The final section shows how Non-Maximum Suppression uses IoU to clean up overlapping boxes.

  • The diagram starts with multiple overlapping bounding boxes around a single object.

  • It first picks the box with the highest confidence score.

  • It then checks the IoU of this box against all the other boxes.

  • Any boxes with an IoU above a set threshold are removed. The end result is a single, clean bounding box around the object, which is the best prediction.