Training YOLOv4 on Google Colab

10 min readJun 18, 2023

Using Google Colab GPUs to speed up YOLOv4 training.

AI for Paupers, Misers and Cheapskates

Make no mistake — AI is extremely energy intensive. Not only do the latest AI models require large amounts of electricity to run, they also require optimal hardware such as dedicated GPUs to run fast.

This puts paupers, misers and cheapskates who do not have access to a dedicated deep learning rig or a paid cloud service such as AWS at a disadvantage.

Thankfully though, Google has a free tier for its cloud services (nothing is truly free of course, and this is a story for another time!) which is accessible from Google Colab. While Google Colab’s main focus is running Python code using Jupyter notebooks, with magic commands non-Python code can also be executed. This opens up plenty of opportunities, as many AI models are not written in Python.

In this article we show how to use Google Colab perform transfer learning on YOLO, a well known deep learning computer vision model written in C and CUDA.

Setting up Google Colab’s GPU

Before doing anything else Google Colab’s GPU needs to be set up as the CPU will be used by default.

In order to use the GPU, go to “Change runtime type” under “Runtime” in the menu bar, and select “GPU” under “Hardware accelerator”. Don’t forget to “Save”!

Setting up Colab’s T4 GPU. Image created by the author.

In the free tier, only the T4 GPU is available. Paying for premium tiers will unlock more powerful GPUs such as the A100 or V100 GPU.

More information about GPUs available on Google Cloud can be found here.

Downloading Data Directly From Kaggle

In this demonstration we will download data directly from Kaggle into Google Colab to train the model. In order to do so you have to first create a Kaggle API token. The downloaded Kaggle API token kaggle.json file will need to be uploaded to Google Colab.

After downloading kaggle.json from your Kaggle account page, create a directory named .kaggle within Google Colab’s environment.

!mkdir ~/.kaggle

Upload the downloaded kaggle.json to Google Colab and copy it into .kaggle.

!cp ./kaggle.json ~/.kaggle/
!chmod 600 ~/.kaggle/kaggle.json

Datasets can now be downloaded directly from Kaggle. For example the hard hat dataset can be downloaded directly using the following command.

!kaggle datasets download -d andrewmvd/hard-hat-detection

Compiling Darknet

Next we need to compile darknet on Google Colab to train and use YOLO.

First, ensure that the GPU activated earlier can be accessed. As of writing, Google Colab uses CUDA 11.8 for the T4 GPU.

!/usr/local/cuda/bin/nvcc - version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0

Next, clone the darknet repository.

!git clone https://github.com/AlexeyAB/darknet

Once darknet is cloned, we need to modify its make file in order to enable GPU usage by YOLO. The following commands changes the directory to the downloaded darknet repository, and turns on the switches for OpenCV, GPU and CUDNN in Makefile.

%cd darknet
!sed -i 's/OPENCV=0/OPENCV=1/' Makefile
!sed -i 's/GPU=0/GPU=1/' Makefile
!sed -i 's/CUDNN=0/CUDNN=1/' Makefile
!sed -i 's/CUDNN_HALF=0/CUDNN_HALF=1/' Makefile

Once Makefile is properly configured, compile darknet.

!make

This command will print a long list of logs and warning messages which can be safely ignored.

Downloading Pretrained Weights for Transfer Learning

In this demonstration, we use the tiny model which is significantly smaller than the original model and therefore faster to train.

For transfer learning, the partial weights for the first 29 layers of the tiny model can be downloaded using the command:

# Download partial yolo-tiny weights for transfer learning.
!wget https://github.com/AlexeyAB/darknet/releases/download/darknet_yolo_v4_pre/yolov4-tiny.conv.29

Preparing the Data for YOLO

Now that YOLO has been compiled and the weights downloaded, it is time to prepare the data for YOLO.

First, copy the downloaded zip file into darknet,

!cp ../hard-hat-detection.zip ./

and unzip the file.

!unzip hard-hat-detection.zip

For the hard hat detection data, directories will be created — images and annotations.

Each image in images has a corresponding .xml file in annotations. We need to extract all individual bounding box annotations from each.xml file, reformat the bounding box, and then save the annotations to a .txt file, one for each image.

As the bounding box annotations for this particular dataset are in the Pascal VOC format, we need to modify them to the YOLO format for YOLO to ingest. We will use Python to help us do this.

import os

from google.colab import files

import cv2
import matplotlib.pyplot as plt
import numpy as np

from sklearn.model_selection import train_test_split

import xml.etree.ElementTree as ET


# Functions to convert Pascal VOC bounding boxes to YOLO format.
def xml_to_yolo_bbox(bndbox, image_size):
    # xml to yolo bounding box.
    xmin, xmax, ymin, ymax = bndbox
    # x and y mid points of the bounding box.
    x_mid = (xmin + xmax) * 0.5
    y_mid = (ymin + ymax) * 0.5
    # Height and width of the box.
    height = ymax - ymin
    width = xmax - xmin

    image_width, image_height = image_size
    image_width_inv = 1 / image_width
    image_height_inv = 1 / image_height

    x_mid = x_mid * image_width_inv
    width = width * image_width_inv
    y_mid = y_mid * image_height_inv
    height = height * image_height_inv
    return [x_mid, y_mid, width, height]


def xml_to_yolo(xml_file_path, txt_file_path, wanted_classes):
    tree = ET.parse(xml_file_path)
    root = tree.getroot()

    size = root.find("size")
    img_width = int(size.find("width").text)
    img_height = int(size.find("height").text)

    output_string = ""

    for o in root.iter("object"):
        obj_class = o.find("name").text
        if obj_class in wanted_classes:
          obj_class_id = str(wanted_classes.index(obj_class))
          bndbox = o.find("bndbox")
          bndbox = [float(bndbox.find("xmin").text),
                    float(bndbox.find("xmax").text),
                    float(bndbox.find("ymin").text),
                    float(bndbox.find("ymax").text)]

          yolo_box = xml_to_yolo_bbox(bndbox, [img_width, img_height])

          output_string = output_string + obj_class_id + " "
          output_string = output_string + " ".join([str(b) for b in yolo_box])
          output_string = output_string + "\n"

    with open(txt_file_path, "w") as f:
        f.write(output_string)

First, create a file named obj.names which contains the unique category names in the dataset. For the downloaded dataset the classes we are interested in is “helmet” and “head”.

# Prepare the obj.names, obj.data and .cfg files required for Yolov4 training.
wanted_classes = ["helmet", "head"]
image_extension = ".png"

# 1. obj.names
# This file contains the categories of all the objects in the dataset.
# The index of the category serves as its numerical category.
# Therefore class 0 is helmet and class 1 is head.
obj_names_output_string = ""
for wc in wanted_classes:
    obj_names_output_string = obj_names_output_string + wc + "\n"

with open("obj.names", "w") as f:
    f.write(obj_names_output_string)

obj.names will have the following contents.

helmet
head

Next we create the .txt YOLO annotation files, one for each image.

# 2. .txt annotation files.
# YOLO requires that each image file has its own .txt annotation file with the
# bounding box format:
# category x_mid_point y_mid_point width height
ann_xml_file_list = os.listdir("annotations/")
image_file_list = []

# Get all image names.
for i in os.listdir("images/"):
    if i[-4:] == image_extension:
        image_file_list.append(i)

print("{} images.".format(len(image_file_list)))

# Create the .txt annotations from the .xml files.
for a in ann_xml_file_list:
    if a[-4:] == ".xml":
        file_root_name = a[:-4]
        if file_root_name + image_extension in image_file_list:
            txt_name = file_root_name + ".txt"
            xml_to_yolo(os.path.join("annotations/", a),
                        os.path.join("images/", txt_name),
                        wanted_classes)

In addition to the individual .txt annotations files, we also create train.txt and test.txt which explicitly specify the list of training and validation images respectively.

# In addition to the annotation .txt files, we also need to explicitly specify
# the training images and test images as .txt files.

# Make the train test split.
train_image_list, test_image_list = train_test_split(image_file_list,
                                                     test_size = 0.2,
                                                     random_state = 42)
train_image_list.sort()
test_image_list.sort()
print("Train images: {}, test images: {}.".format(len(train_image_list),
                                                  len(test_image_list)))

# Make the train.txt file.
# This will contain a list of all training images.
train_txt_output_string = ""
for f in train_image_list:
    train_txt_output_string = train_txt_output_string + "images/" + f + "\n"

with open("train.txt", "w") as f:
    f.write(train_txt_output_string)

# Make the test.txt file.
# This will contain a list of all testing images.
test_txt_output_string = ""
for f in test_image_list:
    test_txt_output_string = test_txt_output_string + "images/" + f + "\n"

with open("test.txt", "w") as f:
    f.write(test_txt_output_string)

For example, train.txt will have the following contents.

images/hard_hat_workers0.png
images/hard_hat_workers1.png
images/hard_hat_workers10.png
images/hard_hat_workers100.png
images/hard_hat_workers1000.png

Finally we create obj.data which ties all the files created above together. This file also tells YOLO where to save the outputs. In this case we specify the backup location to be backup.

# 3. obj.data
# This file tells YOLO how many classes there are in the dataset, the paths to
# train.txt and test.txt, obj.names as well as the output location "backup".
obj_data_output_string = """classes = 2
train = train.txt
valid = test.txt
names = obj.names
backup = backup"""

with open("obj.data", "w") as f:
    f.write(obj_data_output_string)

obj.data will have the following contents.

classes = 2
train = train.txt
valid = test.txt
names = obj.names
backup = backup

Preparing the Configuration File

Next we need to prepare the YOLO configuration file. This file specifies the model architecture as well as the training hyper parameters. Instead of creating one from scratch we will simply modify an existing version.

!cp cfg/yolov4-tiny-custom.cfg ./

We follow the instructions provided on the darknet github repository to modify the copied .cfg file to our requirements. In particular, the downloaded dataset has 2 classes — therefore the number of filters before each yolo layer as well as the number of classes need to be adjusted accordingly.

# Modify .cfg as outlined in https://github.com/AlexeyAB/darknet.
with open("yolov4-tiny-custom.cfg", "r") as f:
    cfg_string = f.read()

cfg_string = cfg_string.split("\n")

# change line subdivisions to subdivisions=16
cfg_string[6] = "subdivisions=16"
# change line max_batches to (classes*2000, but not less than number of training
# images and not less than 6000), f.e. max_batches=6000 if you train for 3
# classes
cfg_string[19] = "max_batches=6000"
# change line steps to 80% and 90% of max_batches, f.e. steps=4800,5400
cfg_string[21] = "steps=4800,5400"
# change [filters=255] to filters=(classes + 5)x3 in the 3 [convolutional]
# before each [yolo] layer, keep in mind that it only has to be the last
# [convolutional] before each of the [yolo] layers
# change line classes=80 to your number of objects in each of 3 [yolo]-layers
# For YOLO-Mini there are only 2 [yolo] layers
cfg_string[211] = "filters=21"
cfg_string[219] = "classes=2"
cfg_string[262] = "filters=21"
cfg_string[268] = "classes=2"

cfg_string = "\n".join(cfg_string)

with open("yolov4-tiny-custom.cfg", "w") as f:
    f.write(cfg_string)

YOLO Transfer Learning on Google Colab

That was quite a bit of work — however if everything has been done correctly, we are finally able to start the transfer learning process.

We execute the compiled darknet executable in train mode, and pass to it the data settings file obj.data, the model configuration file yolov4-tiny-custom.cfg and the partial pretrained weights yolov4-tiny.conv.29. We opt not to show the training history using -dont_show, as the visualizations are not supported by Google Colab. Finally, we opt to use the mean average precision to monitor the training progress using -map.

!./darknet detector train obj.data yolov4-tiny-custom.cfg yolov4-tiny.conv.29 -dont_show -map

Even when using GPUs, this training will take some time. In my case, training took almost 1.5 hours even for the tiny model.

At the end, the model displays the the training metrics, as well as the names of the fine tuned weight files.

 Tensor Cores are used.
 Last accuracy mAP@0.50 = 90.96 %, best = 90.96 % 
 6000: 0.609664, 0.596759 avg loss, 0.000026 rate, 0.849180 seconds, 384000 images, 0.025016 hours left

 calculation mAP (mean average precision)...
 Detection layer: 30 - type = 28 
 Detection layer: 37 - type = 28 
1000
 detections_count = 18411, unique_truth_count = 4822  
class_id = 0, name = helmet, ap = 92.04%     (TP = 3236, FP = 510) 
class_id = 1, name = head, ap = 89.79%     (TP = 1025, FP = 175) 

 for conf_thresh = 0.25, precision = 0.86, recall = 0.88, F1-score = 0.87 
 for conf_thresh = 0.25, TP = 4261, FP = 685, FN = 561, average IoU = 69.20 % 

 IoU threshold = 50 %, used Area-Under-Curve for each unique Recall 
 mean average precision (mAP@0.50) = 0.909143, or 90.91 % 
Total Detection Time: 13 Seconds

Set -points flag:
 `-points 101` for MS COCO 
 `-points 11` for PascalVOC 2007 (uncomment `difficult` in voc.data) 
 `-points 0` (AUC) for ImageNet, PascalVOC 2010-2012, your custom dataset

 mean_average_precision (mAP@0.50) = 0.909143 
Saving weights to backup/yolov4-tiny-custom_6000.weights
Saving weights to backup/yolov4-tiny-custom_last.weights
Saving weights to backup/yolov4-tiny-custom_final.weights
If you want to train from the beginning, then use flag in the end of training command: -clear

YOLO Inference on Google Colab

Once the transfer learning process is complete, we can perform inference using the fine tuned weights saved under backup.

We execute the compiled darknet executable in test mode, and pass to it the data settings file obj.data, the model configuration file yolov4-tiny-custom.cfg. This time round, we pass the fine tuned weights backup/yolov4-tiny-custom_last.weights. We also pass an image for inference images/hard_hat_workers10.png. Instead of showing the output, we opt to save it to a file named predictions.jpg.

!./darknet detector test obj.data yolov4-tiny-custom.cfg backup/yolov4-tiny-custom_last.weights images/hard_hat_workers10.png --dont_show --out_filename predictions.jpg

Downloading and opening predictions.jpg shows that the fine tuned model was able to correctly predict the locations of hard hats in the input image!

Hard hat predictions with the fine tuned weights. Image created by the author.

Summary

In this demonstration we showed how to use Google Colab’s GPUs to train the deep learning model YOLO, a computer vision model written in C. We showed how to compile and prepare the model for training, and how to prepare the various data settings files. Finally, we showed how to perform the transfer learning process and what the eventual predictions look like.

Training YOLOv4 on Google Colab

AI for Paupers, Misers and Cheapskates

Setting up Google Colab’s GPU

Downloading Data Directly From Kaggle

Compiling Darknet

Downloading Pretrained Weights for Transfer Learning

Preparing the Data for YOLO

Preparing the Configuration File

YOLO Transfer Learning on Google Colab

YOLO Inference on Google Colab

Summary

References

BECOME a WRITER at MLearning.ai //FREE ML Tools// AI Film Critics

Mlearning.ai Submission Suggestions

How to become a writer on Mlearning.ai

Written by Y. Natsume