Training YOLOv4 on Google Colab
Using Google Colab GPUs to speed up YOLOv4 training.
AI for Paupers, Misers and Cheapskates
Make no mistake — AI is extremely energy intensive. Not only do the latest AI models require large amounts of electricity to run, they also require optimal hardware such as dedicated GPUs to run fast.
This puts paupers, misers and cheapskates who do not have access to a dedicated deep learning rig or a paid cloud service such as AWS at a disadvantage.
Thankfully though, Google has a free tier for its cloud services (nothing is truly free of course, and this is a story for another time!) which is accessible from Google Colab. While Google Colab’s main focus is running Python code using Jupyter notebooks, with magic commands non-Python code can also be executed. This opens up plenty of opportunities, as many AI models are not written in Python.
In this article we show how to use Google Colab perform transfer learning on YOLO, a well known deep learning computer vision model written in C and CUDA.
Setting up Google Colab’s GPU
Before doing anything else Google Colab’s GPU needs to be set up as the CPU will be used by default.
In order to use the GPU, go to “Change runtime type” under “Runtime” in the menu bar, and select “GPU” under “Hardware accelerator”. Don’t forget to “Save”!
In the free tier, only the T4 GPU is available. Paying for premium tiers will unlock more powerful GPUs such as the A100 or V100 GPU.
More information about GPUs available on Google Cloud can be found here.
Downloading Data Directly From Kaggle
In this demonstration we will download data directly from Kaggle into Google Colab to train the model. In order to do so you have to first create a Kaggle API token. The downloaded Kaggle API token kaggle.json
file will need to be uploaded to Google Colab.
After downloading kaggle.json
from your Kaggle account page, create a directory named .kaggle
within Google Colab’s environment.
!mkdir ~/.kaggle
Upload the downloaded kaggle.json
to Google Colab and copy it into .kaggle
.
!cp ./kaggle.json ~/.kaggle/
!chmod 600 ~/.kaggle/kaggle.json
Datasets can now be downloaded directly from Kaggle. For example the hard hat dataset can be downloaded directly using the following command.
!kaggle datasets download -d andrewmvd/hard-hat-detection
Compiling Darknet
Next we need to compile darknet on Google Colab to train and use YOLO.
First, ensure that the GPU activated earlier can be accessed. As of writing, Google Colab uses CUDA 11.8 for the T4 GPU.
!/usr/local/cuda/bin/nvcc - version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0
Next, clone the darknet repository.
!git clone https://github.com/AlexeyAB/darknet
Once darknet is cloned, we need to modify its make file in order to enable GPU usage by YOLO. The following commands changes the directory to the downloaded darknet repository, and turns on the switches for OpenCV, GPU and CUDNN in Makefile
.
%cd darknet
!sed -i 's/OPENCV=0/OPENCV=1/' Makefile
!sed -i 's/GPU=0/GPU=1/' Makefile
!sed -i 's/CUDNN=0/CUDNN=1/' Makefile
!sed -i 's/CUDNN_HALF=0/CUDNN_HALF=1/' Makefile
Once Makefile
is properly configured, compile darknet.
!make
This command will print a long list of logs and warning messages which can be safely ignored.
Downloading Pretrained Weights for Transfer Learning
In this demonstration, we use the tiny model which is significantly smaller than the original model and therefore faster to train.
For transfer learning, the partial weights for the first 29 layers of the tiny model can be downloaded using the command:
# Download partial yolo-tiny weights for transfer learning.
!wget https://github.com/AlexeyAB/darknet/releases/download/darknet_yolo_v4_pre/yolov4-tiny.conv.29
Preparing the Data for YOLO
Now that YOLO has been compiled and the weights downloaded, it is time to prepare the data for YOLO.
First, copy the downloaded zip file into darknet
,
!cp ../hard-hat-detection.zip ./
and unzip the file.
!unzip hard-hat-detection.zip
For the hard hat detection data, directories will be created — images
and annotations
.
Each image in images
has a corresponding .xml
file in annotations
. We need to extract all individual bounding box annotations from each.xml
file, reformat the bounding box, and then save the annotations to a .txt
file, one for each image.
As the bounding box annotations for this particular dataset are in the Pascal VOC format, we need to modify them to the YOLO format for YOLO to ingest. We will use Python to help us do this.
import os
from google.colab import files
import cv2
import matplotlib.pyplot as plt
import numpy as np
from sklearn.model_selection import train_test_split
import xml.etree.ElementTree as ET
# Functions to convert Pascal VOC bounding boxes to YOLO format.
def xml_to_yolo_bbox(bndbox, image_size):
# xml to yolo bounding box.
xmin, xmax, ymin, ymax = bndbox
# x and y mid points of the bounding box.
x_mid = (xmin + xmax) * 0.5
y_mid = (ymin + ymax) * 0.5
# Height and width of the box.
height = ymax - ymin
width = xmax - xmin
image_width, image_height = image_size
image_width_inv = 1 / image_width
image_height_inv = 1 / image_height
x_mid = x_mid * image_width_inv
width = width * image_width_inv
y_mid = y_mid * image_height_inv
height = height * image_height_inv
return [x_mid, y_mid, width, height]
def xml_to_yolo(xml_file_path, txt_file_path, wanted_classes):
tree = ET.parse(xml_file_path)
root = tree.getroot()
size = root.find("size")
img_width = int(size.find("width").text)
img_height = int(size.find("height").text)
output_string = ""
for o in root.iter("object"):
obj_class = o.find("name").text
if obj_class in wanted_classes:
obj_class_id = str(wanted_classes.index(obj_class))
bndbox = o.find("bndbox")
bndbox = [float(bndbox.find("xmin").text),
float(bndbox.find("xmax").text),
float(bndbox.find("ymin").text),
float(bndbox.find("ymax").text)]
yolo_box = xml_to_yolo_bbox(bndbox, [img_width, img_height])
output_string = output_string + obj_class_id + " "
output_string = output_string + " ".join([str(b) for b in yolo_box])
output_string = output_string + "\n"
with open(txt_file_path, "w") as f:
f.write(output_string)
First, create a file named obj.names
which contains the unique category names in the dataset. For the downloaded dataset the classes we are interested in is “helmet” and “head”.
# Prepare the obj.names, obj.data and .cfg files required for Yolov4 training.
wanted_classes = ["helmet", "head"]
image_extension = ".png"
# 1. obj.names
# This file contains the categories of all the objects in the dataset.
# The index of the category serves as its numerical category.
# Therefore class 0 is helmet and class 1 is head.
obj_names_output_string = ""
for wc in wanted_classes:
obj_names_output_string = obj_names_output_string + wc + "\n"
with open("obj.names", "w") as f:
f.write(obj_names_output_string)
obj.names
will have the following contents.
helmet
head
Next we create the .txt
YOLO annotation files, one for each image.
# 2. .txt annotation files.
# YOLO requires that each image file has its own .txt annotation file with the
# bounding box format:
# category x_mid_point y_mid_point width height
ann_xml_file_list = os.listdir("annotations/")
image_file_list = []
# Get all image names.
for i in os.listdir("images/"):
if i[-4:] == image_extension:
image_file_list.append(i)
print("{} images.".format(len(image_file_list)))
# Create the .txt annotations from the .xml files.
for a in ann_xml_file_list:
if a[-4:] == ".xml":
file_root_name = a[:-4]
if file_root_name + image_extension in image_file_list:
txt_name = file_root_name + ".txt"
xml_to_yolo(os.path.join("annotations/", a),
os.path.join("images/", txt_name),
wanted_classes)
In addition to the individual .txt
annotations files, we also create train.txt
and test.txt
which explicitly specify the list of training and validation images respectively.
# In addition to the annotation .txt files, we also need to explicitly specify
# the training images and test images as .txt files.
# Make the train test split.
train_image_list, test_image_list = train_test_split(image_file_list,
test_size = 0.2,
random_state = 42)
train_image_list.sort()
test_image_list.sort()
print("Train images: {}, test images: {}.".format(len(train_image_list),
len(test_image_list)))
# Make the train.txt file.
# This will contain a list of all training images.
train_txt_output_string = ""
for f in train_image_list:
train_txt_output_string = train_txt_output_string + "images/" + f + "\n"
with open("train.txt", "w") as f:
f.write(train_txt_output_string)
# Make the test.txt file.
# This will contain a list of all testing images.
test_txt_output_string = ""
for f in test_image_list:
test_txt_output_string = test_txt_output_string + "images/" + f + "\n"
with open("test.txt", "w") as f:
f.write(test_txt_output_string)
For example, train.txt
will have the following contents.
images/hard_hat_workers0.png
images/hard_hat_workers1.png
images/hard_hat_workers10.png
images/hard_hat_workers100.png
images/hard_hat_workers1000.png
Finally we create obj.data
which ties all the files created above together. This file also tells YOLO where to save the outputs. In this case we specify the backup location to be backup
.
# 3. obj.data
# This file tells YOLO how many classes there are in the dataset, the paths to
# train.txt and test.txt, obj.names as well as the output location "backup".
obj_data_output_string = """classes = 2
train = train.txt
valid = test.txt
names = obj.names
backup = backup"""
with open("obj.data", "w") as f:
f.write(obj_data_output_string)
obj.data
will have the following contents.
classes = 2
train = train.txt
valid = test.txt
names = obj.names
backup = backup
Preparing the Configuration File
Next we need to prepare the YOLO configuration file. This file specifies the model architecture as well as the training hyper parameters. Instead of creating one from scratch we will simply modify an existing version.
!cp cfg/yolov4-tiny-custom.cfg ./
We follow the instructions provided on the darknet github repository to modify the copied .cfg
file to our requirements. In particular, the downloaded dataset has 2 classes — therefore the number of filters
before each yolo
layer as well as the number of classes
need to be adjusted accordingly.
# Modify .cfg as outlined in https://github.com/AlexeyAB/darknet.
with open("yolov4-tiny-custom.cfg", "r") as f:
cfg_string = f.read()
cfg_string = cfg_string.split("\n")
# change line subdivisions to subdivisions=16
cfg_string[6] = "subdivisions=16"
# change line max_batches to (classes*2000, but not less than number of training
# images and not less than 6000), f.e. max_batches=6000 if you train for 3
# classes
cfg_string[19] = "max_batches=6000"
# change line steps to 80% and 90% of max_batches, f.e. steps=4800,5400
cfg_string[21] = "steps=4800,5400"
# change [filters=255] to filters=(classes + 5)x3 in the 3 [convolutional]
# before each [yolo] layer, keep in mind that it only has to be the last
# [convolutional] before each of the [yolo] layers
# change line classes=80 to your number of objects in each of 3 [yolo]-layers
# For YOLO-Mini there are only 2 [yolo] layers
cfg_string[211] = "filters=21"
cfg_string[219] = "classes=2"
cfg_string[262] = "filters=21"
cfg_string[268] = "classes=2"
cfg_string = "\n".join(cfg_string)
with open("yolov4-tiny-custom.cfg", "w") as f:
f.write(cfg_string)
YOLO Transfer Learning on Google Colab
That was quite a bit of work — however if everything has been done correctly, we are finally able to start the transfer learning process.
We execute the compiled darknet
executable in train
mode, and pass to it the data settings file obj.data
, the model configuration file yolov4-tiny-custom.cfg
and the partial pretrained weights yolov4-tiny.conv.29
. We opt not to show the training history using -dont_show
, as the visualizations are not supported by Google Colab. Finally, we opt to use the mean average precision to monitor the training progress using -map
.
!./darknet detector train obj.data yolov4-tiny-custom.cfg yolov4-tiny.conv.29 -dont_show -map
Even when using GPUs, this training will take some time. In my case, training took almost 1.5 hours even for the tiny model.
At the end, the model displays the the training metrics, as well as the names of the fine tuned weight files.
Tensor Cores are used.
Last accuracy mAP@0.50 = 90.96 %, best = 90.96 %
6000: 0.609664, 0.596759 avg loss, 0.000026 rate, 0.849180 seconds, 384000 images, 0.025016 hours left
calculation mAP (mean average precision)...
Detection layer: 30 - type = 28
Detection layer: 37 - type = 28
1000
detections_count = 18411, unique_truth_count = 4822
class_id = 0, name = helmet, ap = 92.04% (TP = 3236, FP = 510)
class_id = 1, name = head, ap = 89.79% (TP = 1025, FP = 175)
for conf_thresh = 0.25, precision = 0.86, recall = 0.88, F1-score = 0.87
for conf_thresh = 0.25, TP = 4261, FP = 685, FN = 561, average IoU = 69.20 %
IoU threshold = 50 %, used Area-Under-Curve for each unique Recall
mean average precision (mAP@0.50) = 0.909143, or 90.91 %
Total Detection Time: 13 Seconds
Set -points flag:
`-points 101` for MS COCO
`-points 11` for PascalVOC 2007 (uncomment `difficult` in voc.data)
`-points 0` (AUC) for ImageNet, PascalVOC 2010-2012, your custom dataset
mean_average_precision (mAP@0.50) = 0.909143
Saving weights to backup/yolov4-tiny-custom_6000.weights
Saving weights to backup/yolov4-tiny-custom_last.weights
Saving weights to backup/yolov4-tiny-custom_final.weights
If you want to train from the beginning, then use flag in the end of training command: -clear
YOLO Inference on Google Colab
Once the transfer learning process is complete, we can perform inference using the fine tuned weights saved under backup
.
We execute the compiled darknet
executable in test
mode, and pass to it the data settings file obj.data
, the model configuration file yolov4-tiny-custom.cfg
. This time round, we pass the fine tuned weights backup/yolov4-tiny-custom_last.weights
. We also pass an image for inference images/hard_hat_workers10.png
. Instead of showing the output, we opt to save it to a file named predictions.jpg
.
!./darknet detector test obj.data yolov4-tiny-custom.cfg backup/yolov4-tiny-custom_last.weights images/hard_hat_workers10.png --dont_show --out_filename predictions.jpg
Downloading and opening predictions.jpg
shows that the fine tuned model was able to correctly predict the locations of hard hats in the input image!
Summary
In this demonstration we showed how to use Google Colab’s GPUs to train the deep learning model YOLO, a computer vision model written in C. We showed how to compile and prepare the model for training, and how to prepare the various data settings files. Finally, we showed how to perform the transfer learning process and what the eventual predictions look like.
The Jupyter notebook for the code above is available on GitHub.
References
- https://github.com/AlexeyAB/darknet
- https://github.com/theAIGuysCode/YOLOv4-Cloud-Tutorial/blob/master/YOLOv4_Training_Tutorial.ipynb
- https://albumentations.ai/docs/getting_started/bounding_boxes_augmentation/
- https://jonathan-hui.medium.com/map-mean-average-precision-for-object-detection-45c121a31173