Object Tracking with YOLOv8 and Python Aritra Roy Gosthipaty and Ritwik Raha PyImageSearch

[[{“value”:”

Object Tracking with YOLOv8 and Python

In this tutorial, you will learn object tracking and detection with the YOLOv8 model using the Python Software Development Kit (SDK).

To learn how to track objects from video streams and camera footage for monitoring, tracking, and counting (as shown in Figure 1), just keep reading.

Figure 1: An interactive demo for Object Tracking in Videos (source: created from the code by the authors).

Looking for the source code to this post?

Jump Right To The Downloads Section

YOLOv8: Reliable Object Detection and Tracking

In the rapidly advancing field of computer vision, YOLO (You Only Look Once) models have established themselves as a gold standard for real-time object detection. The latest iteration, YOLOv8, brings significant improvements in accuracy and speed, further pushing the boundaries of what’s possible in object detection and tracking. This blog post delves into the architecture of YOLOv8, how it achieves its impressive performance and provides practical examples using the Ultralytics YOLO Application Programming Interface (API).

A custom, annotated image dataset is vital for training the YOLOv8 object detector. It allows us to train the model on specific objects of interest, leading to a detector tailored to our requirements.

Roboflow offers free tools for each stage of the computer vision pipeline, which will streamline your workflows and supercharge your productivity.

Sign up or Log in to your Roboflow account to access state-of-the-art dataset libraries and revolutionize your computer vision pipeline.

You can start by choosing your own datasets or using our PyimageSearch assorted library of useful datasets.

Bring data in any of 40+ formats to Roboflow, train using any state-of-the-art model architectures, deploy across multiple platforms (API, NVIDIA, browser, iOS, etc.), and connect to applications or 3rd-party tools.

Understanding YOLOv8 Architecture

YOLOv8 (architecture shown in Figure 2), Ultralytics’s latest version of the YOLO model, represents a state-of-the-art advancement in computer vision. Building on the success of its predecessors, YOLOv8 introduces new features and improvements that enhance performance, flexibility, and efficiency. This cutting-edge model supports a comprehensive range of vision AI tasks, including detection, segmentation, pose estimation, tracking, and classification. Its versatility enables users to apply YOLOv8’s powerful capabilities across a wide array of applications and domains.

Figure 2: Architecture diagram of YOLOv8 (source: image from the issue: https://github.com/ultralytics/ultralytics/issues/189)

The main features of YOLOv8 include mosaic data augmentation, anchor-free detection, a coarse-to-fine (C2f) module, a decoupled head, and a modified loss function. Let’s delve into each change in more detail.

Mosaic Data Augmentation

Like YOLOv4, YOLOv8 uses mosaic data augmentation that mixes four images to provide the model with better context information. The change in YOLOv8 is that the augmentation stops in the last 10 training epochs to improve performance.

Anchor-Free Detection

YOLOv8 switched to anchor-free detection to improve generalization. In anchor-based detection, predefined anchor boxes slow down learning for custom datasets. Anchor-free detection allows the model to directly predict an object’s center, reducing the number of bounding box predictions. This speeds up Non-Maximum Suppression (NMS), a process that eliminates incorrect predictions.

C2f (Coarse-to-Fine) Module

The model’s backbone now uses a C2f module instead of a C3 module. The key difference is that in C2f, the output of all bottleneck modules is concatenated, while in C3, only the output of the last bottleneck module is used. Bottleneck modules, composed of bottleneck residual blocks, reduce computational costs in deep learning networks, speeding up training and improving gradient flow.

Decoupled Head

The diagram above (Figure 2) shows that the head no longer performs classification and regression together. Instead, these tasks are now performed separately, which increases model performance.

Loss

Loss Misalignment The decoupled head separates classification and regression tasks, potentially causing the model to localize one object while classifying another.Solution Include a task alignment score to help the model identify positive and negative samples.The task alignment score is calculated by multiplying the classification score with the Intersection over Union (IoU) score.IoU ScoreMeasures the accuracy of a bounding box prediction.Based on the Alignment ScoreThe model selects the top-k positive samples.Computes a classification loss using Binary Cross-Entropy (BCE).Computes a regression loss using Complete IoU (CIoU) and Distributional Focal Loss (DFL).BCE LossMeasures the difference between actual and predicted labels.CIoU LossConsiders the predicted bounding box’s relation to the ground truth in terms of center point and aspect ratio.Distributional Focal Loss (DFL)Optimizes the distribution of bounding box boundaries.Focuses more on samples that the model misclassifies as false negatives.

Object Detection and Tracking with YOLOv8

Object detection and tracking are critical tasks in many applications, from autonomous driving to video surveillance. YOLOv8 excels in these areas due to its robust architecture and innovative features.

Object Detection

Object detection involves identifying and localizing objects within an image. YOLOv8 achieves this with high accuracy and speed, as demonstrated by its performance metrics on the Common Objects in Context (COCO) dataset:

| Model | Size (pixels) | mAPval 50-95 | Speed CPU ONNX (ms) | Speed A100 TensorRT (ms) | Params (M) | FLOPs (B) |
|———-|—————-|————–|———————-|————————–|————|———–|
| YOLOv8n | 640 | 37.3 | 80.4 | 0.99 | 3.2 | 8.7 |
| YOLOv8s | 640 | 44.9 | 128.4 | 1.20 | 11.2 | 28.6 |
| YOLOv8m | 640 | 50.2 | 234.7 | 1.83 | 25.9 | 78.9 |
| YOLOv8l | 640 | 52.9 | 375.2 | 2.39 | 43.7 | 165.2 |
| YOLOv8x | 640 | 53.9 | 479.1 | 3.53 | 68.2 | 257.8 |

These metrics highlight YOLOv8’s efficiency and effectiveness in object detection tasks, making it suitable for a wide range of applications.

Object Tracking

Object tracking involves following an object across multiple frames in a video. YOLOv8’s architecture supports high-speed, accurate object detection, which is essential for real-time tracking applications. By combining YOLOv8 with tracking algorithms, it’s possible to maintain consistent identities for objects as they move through video frames.

If you need to train YOLOv8 or any other architecture for object detection and need access to 120K+ images curated and labeled with object bounding boxes to train, explore, and experiment with … for free, then head over to Roboflow and get a free account to start accessing high-quality labeled images.

Practical Examples Using Ultralytics YOLO API

The Ultralytics YOLO API simplifies the process of using YOLOv8 for object detection and tracking. Here are some examples to get you started:

Loading a Pre-Trained Model

from ultralytics import YOLO

# Load a model
model = YOLO(“yolov8n.pt”) # Load an official model
model = YOLO(“path/to/best.pt”) # Load a custom model

Predicting with the Model

# Predict on an image
results = model(“https://ultralytics.com/images/bus.jpg”) # Predict on an image

Training a Model

Training YOLOv8 on custom datasets is straightforward. Here’s how you can train YOLOv8n on the COCO8 dataset for 100 epochs:

from ultralytics import YOLO

# Load a model
model = YOLO(“yolov8n.yaml”) # Build a new model from YAML
model = YOLO(“yolov8n.pt”) # Load a pretrained model (recommended for training)
model = YOLO(“yolov8n.yaml”).load(“yolov8n.pt”) # Build from YAML and transfer weights

# Train the model
results = model.train(data=”coco8.yaml”, epochs=100, imgsz=640)

Dataset Format

YOLOv8 supports a specific dataset format for object detection. To convert your existing dataset from other formats (e.g., COCO) to YOLO format, you can use the JSON2YOLO tool provided by Ultralytics.

Object Tracking with YOLOv8 on Video Streams

Do you need custom images to train or test this pipeline, or simply measure its effectiveness? Then, head over to Roboflow and get a free account to grab these object-detection-in-the-wild images.

Configuring Your Development Environment

To follow this guide, you need to have the ultralytics library installed on your system.

Luckily, ultralytics is pip-installable:

$ pip install ultralytics

Need Help Configuring Your Development Environment?

All that said, are you:

Short on time?Learning on your employer’s administratively locked system?Wanting to skip the hassle of fighting with the command line, package managers, and virtual environments?Ready to run the code immediately on your Windows, macOS, or Linux system?

Then join PyImageSearch University today!

Gain access to Jupyter Notebooks for this tutorial and other PyImageSearch guides pre-configured to run on Google Colab’s ecosystem right in your web browser! No installation required.

And best of all, these Jupyter Notebooks will run on Windows, macOS, and Linux!

Project Structure

We first need to review our project directory structure.

Start by accessing this tutorial’s “Downloads” section to retrieve the source code and example images.

From there, take a look at the directory structure:

YOLO-VIDEO/
│
├── pyimagesearch/
│ ├── __init__.py
│ └── yolo_tracking.py
│
├── videos/
│ ├── basket-ball.mp4
│ └── output_tracked_video.mp4
│
├── demo.py
└── main.py

In this section, we will explore how to set up the video tracking project using YOLOv8 with Python. We will go through three key scripts: main.py, demo.py, and pyimagesearch/yolo_tracking.py. Each script plays a crucial role in processing videos, tracking objects, and setting up a user interface for ease of use.

Set-Up

Our main.py script is the entry point for our video processing. It imports the track_video function from our yolo_tracking module. Here’s the code:

from pyimagesearch.yolo_tracking import track_video

if __name__ == “__main__”:
# get the input video path
input_video_path = “./videos/basket-ball.mp4”

# process the video
output_video_path = track_video(input_video_path)
print(f”Processed video saved to: {output_video_path}”)

In this script, we define the input video path as ./videos/basket-ball.mp4 and call the track_video function with this path. The processed video is saved, and the output path is printed.

Creating a Gradio Interface

Next, we set up a Gradio interface in demo.py to provide an easy way to test our video tracking function. Gradio is a handy library for creating web-based interfaces for machine learning models.

The entire section on creating a Gradio space is discussed in detail in the video accompanying this blog post. Alternatively, you can also download the code and see how to set up a Gradio space.

Implementing Video Tracking Functionality

The heart of our project lies in pyimagesearch/yolo_tracking.py, where we implement the core video tracking functionality. Here is how to achieve that in code:

from collections import defaultdict
import cv2
import numpy as np
from ultralytics import YOLO

def track_video(video_path):
# load the model
model = YOLO(“yolov8n.pt”)

# open the video file
cap = cv2.VideoCapture(video_path)
track_history = defaultdict(lambda: [])

# get the video properties
fps = int(cap.get(cv2.CAP_PROP_FPS))
frame_width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
frame_height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))

# define the codec and create VideoWriter object
output_path = “output_tracked_video.mp4” # Output video file path
out = cv2.VideoWriter(
output_path, cv2.VideoWriter_fourcc(*”mp4v”), fps, (frame_width, frame_height)
)

# loop through the video frames
while cap.isOpened():
success, frame = cap.read()

if success:
results = model.track(frame, persist=True)
boxes = results[0].boxes.xywh.cpu()
track_ids = (
results[0].boxes.id.int().cpu().tolist()
if results[0].boxes.id is not None
else None
)
annotated_frame = results[0].plot()
# plot the tracks
if track_ids:
for box, track_id in zip(boxes, track_ids):
x, y, w, h = box
track = track_history[track_id]
track.append((float(x), float(y))) # x, y center point
if len(track) > 30: # retain 30 tracks for 30 frames
track.pop(0)

# draw the tracking lines
points = np.array(track).astype(np.int32).reshape((-1, 1, 2))
cv2.polylines(
annotated_frame,
[points],
isClosed=False,
color=(230, 230, 230),
thickness=2,
)

# write the annotated frame
out.write(annotated_frame)
if cv2.waitKey(1) & 0xFF == ord(“q”):
break
else:
break

# release the video capture object and close the display window
cap.release()
out.release()
cv2.destroyAllWindows()

return output_path

In this script, we start by importing the necessary libraries and defining the track_video function. We load the YOLOv8 model, open the video file, and retrieve the video properties like the frames per second (FPS) and frame dimensions. We also set up a VideoWriter to save the output video.

We loop through each frame of the video, process it with YOLO to get tracking results, and annotate the frame with bounding boxes and tracking lines. We maintain a history of tracked points for each object to draw tracking lines. Finally, we write each annotated frame to the output video file.

What’s next? We recommend PyImageSearch University.

Course information:
84 total classes • 114+ hours of on-demand code walkthrough videos • Last updated: February 2024
★★★★★ 4.84 (128 Ratings) • 16,000+ Students Enrolled

I strongly believe that if you had the right teacher you could master computer vision and deep learning.

Do you think learning computer vision and deep learning has to be time-consuming, overwhelming, and complicated? Or has to involve complex mathematics and equations? Or requires a degree in computer science?

That’s not the case.

All you need to master computer vision and deep learning is for someone to explain things to you in simple, intuitive terms. And that’s exactly what I do. My mission is to change education and how complex Artificial Intelligence topics are taught.

If you’re serious about learning computer vision, your next stop should be PyImageSearch University, the most comprehensive computer vision, deep learning, and OpenCV course online today. Here you’ll learn how to successfully and confidently apply computer vision to your work, research, and projects. Join me in computer vision mastery.

Inside PyImageSearch University you’ll find:

&check; 84 courses on essential computer vision, deep learning, and OpenCV topics
&check; 84 Certificates of Completion
&check; 114+ hours of on-demand video
&check; Brand new courses released regularly, ensuring you can keep up with state-of-the-art techniques
&check; Pre-configured Jupyter Notebooks in Google Colab
&check; Run all code examples in your web browser — works on Windows, macOS, and Linux (no dev environment configuration required!)
&check; Access to centralized code repos for all 536+ tutorials on PyImageSearch
&check; Easy one-click downloads for code, datasets, pre-trained models, etc.
&check; Access on mobile, laptop, desktop, etc.

Click here to join PyImageSearch University

Summary

YOLOv8 is currently the most widely used model for object detection. Its core architecture is a step up from the popular YOLOv5 architecture. Its enhanced model, combined with the ease of use provided by the Ultralytics YOLO API, makes it a powerful tool for both researchers and practitioners.

In this project, we set up a YOLOv8 model for object tracking and image recognition. Object detection is a useful tool in any computer vision engineer’s arsenal.

This setup allows us to process a video, track objects using YOLO, and save the annotated video. Additionally, we can run this functionality through a Gradio interface for easy access and testing. By combining these scripts, we have a robust and user-friendly video tracking application.

Whether you’re working on autonomous vehicles, video surveillance, or any other application requiring real-time object detection and tracking, YOLOv8 is well-equipped to meet your needs.

Citation Information

A. R. Gosthipaty and R. Raha. “Object Tracking with YOLOv8 and Python,” PyImageSearch, P. Chugh, S. Huot, and K. Kidriavsteva, eds., 2024, https://pyimg.co/hqdf0

@incollection{ARG-RR_2024_Object-Tracking-YOLOv8-Python,
author = {Aritra Roy Gosthipaty and Ritwik Raha},
title = {Object Tracking with YOLOv8 and Python},
booktitle = {PyImageSearch},
editor = {Puneet Chugh and and Susan Huot and Kseniia Kidriavsteva},
year = {2024},
url = {https://pyimg.co/hqdf0},
}

To download the source code to this post (and be notified when future tutorials are published here on PyImageSearch), simply enter your email address in the form below!

Download the Source Code and FREE 17-page Resource Guide

Enter your email address below to get a .zip of the code and a FREE 17-page Resource Guide on Computer Vision, OpenCV, and Deep Learning. Inside you’ll find my hand-picked tutorials, books, courses, and libraries to help you master CV and DL!

Website

The post Object Tracking with YOLOv8 and Python appeared first on PyImageSearch.

“}]] [[{“value”:”Table of Contents Object Tracking with YOLOv8 and Python YOLOv8: Reliable Object Detection and Tracking Understanding YOLOv8 Architecture Mosaic Data Augmentation Anchor-Free Detection C2f (Coarse-to-Fine) Module Decoupled Head Loss Object Detection and Tracking with YOLOv8 Object Detection Object Tracking Practical…
The post Object Tracking with YOLOv8 and Python appeared first on PyImageSearch.”}]] Read More Advanced Computer Vision, Data Science, Deep Learning, Machine Learning, Object Detection, Object Tracking, Programming Tutorials, Tutorial, Video Object Tracking, YOLO, anchor-free detection, computer vision, data augmentation, machine learning, model training, object detection, object tracking, python, ultralytics, video detection, video processing, vision analytics, yolov8