Custom Object Detection with Haar Cascade (VJ)

Shahar Gino
9 min readMar 10, 2023

Introduction

Object Detection using Haar feature-based cascade classifiers is an effective object detection method proposed by Paul Viola and Michael Jones in their paper, “Rapid Object Detection using a Boosted Cascade of Simple Features” in 2001. It is a machine learning based approach where a cascade function is trained from a lot of positive and negative images. It is then used to detect objects in other images.

A Haar-Feature is just like a kernel in convolutional neural-network (CNN), except that in a CNN, the values of the kernel are determined by training, while a Haar-Feature is manually determined.

Examples for common Haar-Features

The Haar Cascade algorithm is trained on a large dataset of positive and negative images of the object being detected. During training, the algorithm learns the features that distinguish the object from the background, such as edges, lines, and corners.

Once trained, the Haar Cascade algorithm can detect the object in new images by sliding a window over the image and using the learned features to determine whether the object is present or not. The algorithm uses a series of classifiers, each of which determines whether the features in the window match those of the object being detected. If enough classifiers indicate a match, the algorithm considers the window to contain the object.

Haar Cascade — illustration

Haar cascades have several advantages and disadvantages for object detection and recognition tasks:

Pros:

  • Fast: Haar cascades are computationally efficient and can run in real-time on low-power devices.
  • Effective for simple objects: They can be effective for detecting simple objects with distinct features, such as faces, eyes, and cars.
  • Easy to use: Haar cascades are relatively easy to implement and can be trained on small datasets.
  • Low memory requirements: They have low memory requirements and can be stored on small devices.
  • Small dataset for training: a classifier can be trained with a relatively small dataset, because the features are already predefined and the training procedure only optimizes their weightings and allocates them in the right place along the cascade.

Cons:

  • Limited accuracy: Haar cascades can have limited accuracy for complex objects or scenes with cluttered backgrounds.
  • Limited flexibility: They are limited to detecting objects with specific features and may not generalize well to new objects or environments.
  • Limited scalability: Haar cascades may not perform well at different scales or angles, and may require separate models for different views.
  • Limited training options: Haar cascades have limited options for training, and can only be trained on positive and negative examples.

Overall, Haar cascades can be a fast and effective approach for simple object detection tasks, but may have limited accuracy and flexibility for more complex scenarios. The choice between Haar cascades and other approaches depends on the specific requirements of the task at hand.

Haar Cascades vs. Neural Networks

Haar cascades and neural networks are two different approaches for object detection and recognition tasks in computer vision.

Haar cascades are based on the Haar wavelet transform and are a type of machine learning algorithm used for object detection. They work by scanning an image or video frame at different scales and looking for regions of interest that match predefined patterns of features. These patterns are learned from a training set of positive and negative examples. Haar cascades are computationally efficient and can work in real-time on low-power devices.

On the other hand, neural networks are a type of machine learning algorithm inspired by the structure of the human brain. They consist of layers of interconnected nodes that learn to extract features from input data and make predictions based on those features. Neural networks are highly flexible and can be trained to perform a wide variety of tasks, including object detection and recognition. They can be computationally expensive to train and run, but can achieve very high accuracy levels.

In summary, Haar cascades are a simpler and faster approach for object detection, while neural networks offer more flexibility and accuracy, but can be computationally expensive. The choice between the two approaches depends on the specific requirements of the task at hand, such as accuracy, speed, and available computing resources.

Are Haar-Cascades still relevant in the Deep-Learning era?

Haar Cascades can still be relevant in the Deep Learning era for certain tasks where they perform well, such as object detection in images or video. However, Deep Learning-based approaches, such as Convolutional Neural Networks (CNNs), have largely replaced Haar Cascades in many computer vision tasks due to their superior performance on large and complex datasets.

CNNs have shown to be highly effective in tasks such as object detection, recognition, and segmentation, and have achieved state-of-the-art performance in many computer vision tasks. CNNs can automatically learn features from the data without requiring hand-crafted features like Haar Cascades. This makes CNNs more flexible and capable of handling complex data with high variability.

However, Haar Cascades still have some advantages over deep learning-based approaches. They are lightweight and computationally efficient, making them suitable for resource-constrained environments. Also, Haar Cascades can be trained with fewer examples compared to deep learning-based approaches, which require large amounts of data.

Therefore, Haar Cascades can still be relevant in certain contexts, but deep learning-based approaches are generally more powerful and versatile for many computer vision tasks.

Haar-Cascade inference

OpenCV already contains many pre-trained classifiers for face, eyes, smile etc. Those XML files are stored in opencv/data/haarcascades/ folder.

The following example demonstrates a face and eye detection with OpenCV pre-trained classifiers, for the following demo input-image:

% wget 'https://cdn.nba.com/headshots/nba/latest/1040x760/893.png'

The below code consists of the following parts:

  1. Load the XML classifiers
  2. Load the input-image and convert it to gray
  3. Apply the Face detector (classifier)
  4. Apply the Eye detector (classifier) for each detected face
  5. Plot the resulted detection
face_detector = cv2.CascadeClassifier(cv2.data.haarcascades + 'haarcascade_frontalface_default.xml')
eye_detector = cv2.CascadeClassifier(cv2.data.haarcascades + 'haarcascade_eye.xml')

img = cv2.imread('893.png')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
faces = face_detector.detectMultiScale(gray, 1.3, 5)

for (x,y,w,h) in faces:
img = cv2.rectangle(img,(x,y),(x+w,y+h),(255,0,0),2)
roi_gray = gray[y:y+h, x:x+w]
roi_color = img[y:y+h, x:x+w]
eyes = eye_detector.detectMultiScale(roi_gray)
for (ex,ey,ew,eh) in eyes:
cv2.rectangle(roi_color,(ex,ey),(ex+ew,ey+eh),(0,255,0),2)

fig, axes = plt.subplots(1, 1, figsize=(10,10))
axes.imshow(cv2.cvtColor(img, cv2.COLOR_RGB2BGR))
axes.set_title('Haar/VJ Demo')
axes.set_axis_off()
plt.show()

Following is the detection result:

Haar-Cascase Inference example

Haar-Cascade Training

Haar Cascade Training consists on old OpenCV utilities which require Python 2.x and OpenCV 3.x (were removed in 4.x). Note the the Inference is still supported with modern Python (3.x) and OpenCV (4.x).

Following flow is suitable for a Ubuntu machine:

  1. Update and upgrade apt, followed by installing some required packages like CMake, Git, python3.8-dev, python3-numpy, etc.:
% sudo apt-get update && sudo apt-get upgrade
% sudo apt-get install -y build-essential cmake git unzip pkg-config make
% sudo apt-get install -y python3.8-dev python3-numpy libtbb2 libtbb-dev

2. OpenCV packages that aid in reading images and videos:

% sudo apt-get install -y  libjpeg-dev libpng-dev libtiff-dev libgtk2.0-dev libavcodec-dev libavformat-dev \
libswscale-dev libdc1394-22-dev libeigen3-dev libtheora-dev libvorbis-dev libxvidcore-dev libx264-dev \
sphinx-common libtbb-dev yasm libfaac-dev libopencore-amrnb-dev libopencore-amrwb-dev libopenexr-dev \
libgstreamer-plugins-base1.0-dev libavutil-dev libavfilter-dev libavresample-dev

3. Clone and Install OpenCV 3.x:

% mkdir ~/opencv_build && cd ~/opencv_build
% git clone https://github.com/opencv/opencv
% git clone https://github.com/opencv/opencv_contrib
% cd ~/opencv_build/opencv_contrib
% git checkout 3.4
% cd ~/opencv_build/opencv
% git checkout 3.4
% mkdir -p build && cd build
% cmake -D WITH_CUDA=OFF -D BUILD_TIFF=ON -D BUILD_opencv_java=OFF -D WITH_OPENGL=ON -D WITH_OPENCL=ON -D WITH_IPP=ON \
-D WITH_TBB=ON -D WITH_EIGEN=ON -D WITH_V4L=ON -D WITH_VTK=OFF -D BUILD_TESTS=OFF -D BUILD_PERF_TESTS=OFF \
-D CMAKE_BUILD_TYPE=RELEASE -D BUILD_opencv_python2=OFF -D CMAKE_INSTALL_PREFIX=/usr/local \
-D PYTHON3_INCLUDE_DIR=$(python3 -c "from distutils.sysconfig import get_python_inc; print(get_python_inc())") \
-D PYTHON3_PACKAGES_PATH=$(python3 -c "from distutils.sysconfig import get_python_lib; print(get_python_lib())") \
-D INSTALL_C_EXAMPLES=ON -D INSTALL_PYTHON_EXAMPLES=ON -D OPENCV_ENABLE_NONFREE=ON -D OPENCV_GENERATE_PKGCONFIG=ON \
-D PYTHON3_EXECUTABLE=$(which python3) -D PYTHON_DEFAULT_EXECUTABLE=$(which python3) \
-D OPENCV_EXTRA_MODULES_PATH=~/opencv_build/opencv_contrib/modules -D BUILD_EXAMPLES=ON ..
% make -j4
% sudo make install
% sudo ldconfig

4. Check / verify that OpenCV 3.x is installed correctly:

% python3 -c "import cv2; print(cv2.__version__)"

5. Haar Classifier Training:

5.1. Clone mrnugget repository, which is very handy for this task (nicely warps the OpenCV flow):

% cd ~
% git clone https://github.com/mrnugget/opencv-haar-classifier-training.git
% cd opencv-haar-classifier-training

5.2. Populate the positive_images and negative_images folders with images of the custom object (“target”) and of other objects (“background”). Negative images can be retrieve from public sources, e.g. Kaggle, GitHub, etc.

5.3. Create sample sets for training:

% find ./positive_images -iname "*.jpg" > positives.txt
% find ./negative_images -iname "*.jpg" > negatives.txt

% perl bin/createsamples.pl positives.txt negatives.txt samples 1500 \
"opencv_createsamples -bgcolor 0 -bgthresh 0 -maxxangle 1.1 -maxyangle 1.1 maxzangle 0.5 -maxidev 40 -w 80 -h 40"

% python ./tools/mergevec.py -v samples/ -o samples.vec

5.4. Train the classifier (typically takes long time, e.g. few days)

% opencv_traincascade -data classifier -vec samples.vec -bg negatives.txt \
-numStages 20 -minHitRate 0.999 -maxFalseAlarmRate 0.5 -numPos 1000 \
-numNeg 600 -w 80 -h 40 -mode ALL -precalcValBufSize 1024 -precalcIdxBufSize 1024

Note that step 5.4. can be run with an additional flag (-featureType LBP) for obtaining a Local Binary Pattern (LBP) instead of Haar. LBP is faster (a few times faster) but less accurate. (10–20% less than Haar).

After starting the training program it will print back its parameters and then start training. Each stage will print out some analysis as it is trained, including HitRatio (HR) and FalseAlarm ratio (FA) per feature. At the end of each stage the classifier is saved to a file and the process can be stopped and restarted. This is useful if you are tweaking a machine/settings to optimize training speed. The precision of the cascade classifier is determined by the AcceptanceRatio of the last stage and shall typically be around 10^–5. It’s recommended to stop the training (reduce the number of stages) when the AcceptanceRatio saturates for avoiding overfitting. If the training is killed the middle (ctrl+c), then opencv_traincascade shall be called again with the same launching command (-data) but with a reduced -numStages 6. The application will load the trained stages, realize that there is required number of stages, write the result cascade in xml and finish a work. Keep in mind that the last stage*.xml can be broken (partially saved, due to the kill event), and can thereby lead to an exception with the suggested conversion, as the broken xml can not be read.

References

WRITER at MLearning.ai // Code Interpreter // AI’s Safe Deception

--

--