Raspberry Pi and OpenCV for Motion Object Tracking

OpenCV is an open-source computer vision library that provides a rich set of tools and functions for processing image and video data. When combined with a Raspberry Pi car, it can achieve dynamic tracking effects, meaning the car can continuously maintain a certain distance from the object by capturing the object’s motion information using the camera mounted on the car. How is this accomplished?

Raspberry Pi car with camera

Table of Contents

Preliminary Preparation

Assembled Car

Ensure the Raspberry Pi car is assembled with the camera installed. The Logitech C270i camera is used in this setup.

Verify that the camera drivers are correctly installed.

Adjust the camera’s position and angle to avoid obstruction.

Technical Foundations

Familiarize yourself with the Raspbian operating system for better handling of Raspberry Pi hardware and project development.

Proficiently grasp Python syntax to utilize image processing libraries like OpenCV, enabling camera capture and processing, object detection, tracking, etc.

Utilize SSH for remote connections to the Raspberry Pi, allowing you to execute commands, configure settings, and manage the file system without physical access.

Installation and Usage of OpenCV

While you can opt to build and compile OpenCV directly from source code without using a virtual environment, note that this method only supports Python 2.

It is recommended to use a Python virtual environment, which allows you to create an independent Python environment for each project, aiding in isolating dependencies between projects. In this case, you can install and use Python 3 version of OpenCV within the virtual environment.

Acquaint yourself with basic OpenCV operations, such as file IO, image format conversion, camera capture, and window display.

Image Algorithms

Understand image processing algorithms relevant to the project, including image features, classifiers, Haar cascades, object tracking, mean-shift, CAMShift, etc.

UDP Video Transmission

While there are alternative options such as TCP, MQTT, WebSocket, etc., for the Raspberry Pi object tracking project, where data integrity is not a top priority, UDP stands out as a superior choice. As a lightweight protocol, it doesn’t require a connection and offers enhanced real-time capabilities.

GPIO Control

Select a motor driver suitable for your motor type and requirements to receive signals from Raspberry Pi GPIO pins, converting them into control signals for the motors, thus controlling the car’s movement.

Visual Tracking Algorithm Workflow

Achieving motion tracking for the car involves the comprehensive application of various computer vision algorithms. The specific workflow is as follows:

  1. Haar Cascade Detection:

In OpenCV, utilizing the CascadeClassifier object enables the implementation of scale-invariant Haar cascade classification or tracking. This method is widely used in projects like face detection. Custom classifiers can also be trained to detect specific objects. Although the code is concise, the underlying concepts involve feature extraction, sliding windows, classifiers, and non-maximum suppression. Theoretically, employing Haar cascade detection alone can achieve object tracking by performing detection on each frame. However, two issues need attention in practical applications: significant computational load, which may cause lag on resource-constrained devices like Raspberry Pi, and the substantial effort required for collecting sample images of the object from various angles.

  1. CAMShift Object Tracking:

After detecting the object, the CAMShift algorithm is employed for object tracking. This algorithm, an improved version of MeanShift, adapts by adjusting the tracking window. It is suitable for tracking a single-color object and achieves fast computation speeds through iterative density function calculations.

  1. Motion Control Strategy:

Based on the object window detected by CAMShift relative to the center of the screen, the final implementation achieves motion tracking for the car. The motion control strategy involves determining whether the car should turn left or right by comparing the center point of the object rectangle with the center of the screen. Similarly, it decides whether the car should move forward or backward by comparing the size of the rectangle with a predefined value. This strategy enables the car to intelligently adjust its motion direction and speed based on the relative position and size of the object.

Classifier Training

To use the Haar cascade classifier for object detection, a time-consuming model training process is necessary. It is recommended to refer to the official documentation of OpenCV for specific details. The process is outlined as follows:

  1. Collect Negative Samples:
  • Negative samples are samples unrelated to the object, guiding the car to avoid tracking undesired objects.
  • Collect as many negative samples as possible, preferably in the actual testing environment.
  • Consider writing a program to automate the collection process for the car.

Collect Negative Samples

  1. Collect Positive Samples:
  • Positive samples refer to target objects, guiding the car to track specific objects.
  • Use the opencv_createsamples tool to quickly generate a large number of positive samples in conjunction with negative samples.

Collect Positive Samples

  1. Train the Haar Cascade Classifier:
  • Use the opencv_traincascade tool for training the Haar cascade classifier.
  • Training time depends on the number of samples; pay attention to setting parameters like precalcValBuSize and precalcIdBufSize to avoid memory errors.
  1. Test Classifier Effectiveness:
  • After training, obtain a file named cascade.xml containing the learned model information.
  • Use OpenCV to load the classifier for object detection. Example code is provided below:
					import cv2
# Load the pre-trained Haar cascade classifier
ball_haar = cv2.CascadeClassifier('cascade.xml')
# Read the test image
img = cv2.imread('test_image_0.jpg')
# Convert the image to grayscale
gray_img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# Use the Haar cascade classifier for target detection
# scaleFactor: the factor by which the image is reduced each time
# minNeighbors: the minimum number of neighboring rectangles to retain for each detected target
balls = ball_haar.detectMultiScale(gray_img, scaleFactor=1.3, minNeighbors=5)
# Draw rectangles around the detected targets
for x, y, w, h in balls:
    cv2.rectangle(img, (x, y), (x+w, y+h), (0, 255, 0), 2)
# Display the image with target detection boxes in a window
cv2.imshow('img', img)
# Wait for the user to press any key
# Close the window

  1. Optimize Classifier Effectiveness:
  • Adjust parameters of the detectMultiScale function, such as scaleFactor and minNeighbors, according to the needs.
  • Increase the number of training samples and sample diversity to further enhance classifier performance.


Incorporate the following code into the Raspberry Pi to implement a simple object-tracking system. This system detects the object in the image, tracks it in real-time through UDP image transmission, and simultaneously responds to changes in the object’s position and size by controlling the movement of the car.

					import cv2
import numpy as np
import socket
import traceback
import Motor
import time
WIDTH = 640  # Video width and height
HIGHT = 480
center_x = WIDTH / 2
center_y = HIGHT / 2
def getCenter(points):
    '''Calculate the center of the target window'''
    return ((points[0][0] + points[2][0]) / 2, (points[0][1] + points[2][1]) / 2)
def getOffset(point):
    '''Calculate the deviation of the target window center in the horizontal position from the screen center'''
    return point[0] - center_x
def getSize(point):
    '''Calculate the diagonal length of the target window as a scale metric'''
    return np.sqrt(np.sum(np.square(point[0] - point[2])))
# Initialize the socket for sending real-time video frames
HOST = ''
PORT = 9999
server = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
server.connect((HOST, PORT))
# Initialize the motor control module
motor = Motor.Motor()
interval = 0.01  # Movement time for the car in each step
limit_offset = 40  # Horizontal deviation (absolute value) of the target window, beyond which the car turns left or right
limit_size_down = 200  # If the size of the target window is less than this value, control the car to move forward
limit_size_up = 250  # If the size of the target window is greater than this value, control the car to move backward
# Initialize the camera
cap = cv2.VideoCapture(0)
# Use Haar cascade to detect tennis ball presence
print('Load the cascade...')
ball_cascade = cv2.CascadeClassifier('cascade.xml')
ball_x = 0
ball_y = 0
ball_width = 0
ball_height = 0
    print('Now detect the ball...')
    while True:
        ret, frame = cap.read()  # Read video frames
        if not ret:
        gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
        balls = ball_cascade.detectMultiScale(gray, 1.3, 25)
        if balls is not None and len(balls) > 0:
            # Choose the window with the largest size as the target window
            width = balls[:, 2]
            index = np.argsort(width)[-1]
            (x, y, w, h) = balls[index]
            ball_x = x
            ball_y = y
            ball_width = w
            ball_height = h
            img = cv2.rectangle(frame, (x, y), (x+w, y+h), (255, 0, 0), 2)  # Draw the target window
            break  # Once a tennis ball is detected, exit the Haar detection
        ret, imgencode = cv2.imencode('.jpg', frame, [cv2.IMWRITE_JPEG_QUALITY, 50])  # Encode the image and send it via UDP
    ret, imgencode = cv2.imencode('.jpg', frame, [cv2.IMWRITE_JPEG_QUALITY, 50])
    # Perform CAMShift tracking
    print('Prepare for CAMShift...')
    track_window = (ball_x, ball_y, ball_width, ball_height)  # Target window
    roi = frame[ball_y:ball_y+ball_height, ball_x:ball_x+ball_width]
    hsv_roi = cv2.cvtColor(roi, cv2.COLOR_BGR2HSV)  # Convert to HSV image
    mask = cv2.inRange(hsv_roi, np.array((30., 0., 0.)), np.array((70., 180., 180.)))  # Generate a mask for the target window based on the color range of the tennis ball
    roi_hist = cv2.calcHist([hsv_roi], [0], mask, [180], [0, 180])  # Generate a color histogram for the target window
    cv2.normalize(roi_hist, roi_hist, 0, 255, cv2.NORM_MINMAX)  # Normalize the histogram
    term_crit = (cv2.TERM_CRITERIA_EPS | cv2.TERM_CRITERIA_COUNT, 10, 1)  # Parameters for the CAMShift method
    print('Now track the ball...')
    while True:
        ret, frame = cap.read()
        if not ret:
        hsv = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV)
        dst = cv2.calcBackProject([hsv], [0], roi_hist, [0, 180], 1)  # Use the color histogram to perform histogram backprojection on the image
        ret, track_window = cv2.CamShift(dst, track_window, term_crit)  # Use the CAMShift algorithm to locate the target
        pts = cv2.boxPoints(ret)  # Get the rectangle of the target window
        pts = np.int0(pts)
        img = cv2.polylines(frame, [pts], True, (255, 0, 0), 2)  # Draw the rectangle of the target window
        img = cv2.circle(img, (center_x, center_y), 8, (0, 0, 255), -1)  # Draw the center of the screen
        (point_ball_x, point_ball_y) = getCenter(pts)  # Calculate the center of the target window
        img = cv2.circle(img, (point_ball_x, point_ball_y), 8, (0, 255, 0), -1)  # Draw the center of the target window
        offset = getOffset((point_ball_x, point_ball_y))  # Calculate the horizontal deviation
        img = cv2.putText(img, 'Offset: %d' % offset, (point_ball_x, point_ball_y-10), cv2.FONT_HERSHEY_SIMPLEX, 0.6, (0, 0, 0), 1, cv2.LINE_AA)  # Display the deviation value
        size = getSize(pts)  # Calculate the size
        img = cv2.putText(img, 'Size: %f' % size, (point_ball_x, point_ball_y+10), cv2.FONT_HERSHEY_SIMPLEX, 0.6, (0, 0, 0), 1, cv2.LINE_AA)
        ret, imgencode = cv2.imencode('.jpg', img, [cv2.IMWRITE_JPEG_QUALITY, 50])
        # Control the car movement
        if offset > limit_offset:
            motor.right(interval)  # Turn the car right for interval seconds
        elif offset < -limit_offset:
            motor.left(interval)  # Turn the car left for interval seconds
            if size < limit_size_down:
                motor.ahead(interval)  # Move the car forward for interval seconds
            elif size > limit_size_up:
                motor.rear(interval)  # Move the car backward for interval seconds
except Exception as e:
    print("Error:", e)


In addition to the main program, you also need to write a code snippet to receive image frames transmitted via the UDP protocol, decode them, and display them in a window until an end signal is received or the ESC key is pressed to exit the program.

Further Reading: How to Transmit Raspberry Pi Video Frames to PC

					import cv2
import numpy
import socket
import struct
HOST = ''
PORT = 9999
buffSize = 65535
server = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
server.bind((HOST, PORT))
print('Now waiting for frames...')
while True:
    data, address = server.recvfrom(buffSize)
    if len(data) == 1 and data[0] == 1:
        data = bytearray(data)
        print('Received one frame')
        data = numpy.array(data)
        img_decode = cv2.imdecode(data, 1)  # Decode and display the frame
        cv2.imshow('Frames', img_decode)
        if cv2.waitKey(1) == 27:  # Press ESC to exit



While it can run, in actual testing scenarios, issues may arise if the object moves too quickly or if there is interference too close to the object.

You Might Be Interested

raspberry pi autostart
How to Auto Start Programs on Raspberry Pi

Automating program startup on the Raspberry Pi can be achieved through various methods. Editing the “/etc/rc.local” file or using desktop applications, while simpler, may not

Scroll to Top