Train Your Own YoloV7 Object Detection Model

A guide to train YoloV7 model on custom dataset using Python

Gourav Bais

Published in

Heartbeat

12 min readMar 20, 2023

Introduction

Deep Learning (DL) technologies are now being widely adopted by different organizations that want to improve their services in no time along with great accuracy. Tasks like Image Classification, Voice Recognition, Sentiment Classification, etc., do a substantial job, and human efforts are reduced to a significant percentage.

Object detection is one of the most important concepts in the deep learning space. It is the process of identifying certain objects in an image and correctly classifying them to the respective classes. Models used for this task are called Object Detection Models (ODM), which create a bounding box (rectangular or square box) over the object in an image so that users can directly pay attention to the object they are looking for.

In this article, we are going to focus on the YoloV7 model that is created by WongKinYiu and AlexeyAB, who are the original inventor and maintainers of the Yolo architecture. The Yolov7 model is built on top of Pytorch and gives the state-of-the-art performance of the COCO dataset. This YoloV7 model is available in multiple versions like YoloV7, YoloV7-X, YoloV7-W6, YoloV7-E6, YoloV7-D6, and YoloV7-E6E. All these models run between 36–161 frames (with batch size 1) with the image size as 640 or 1280. You can check the performance of the YoloV7 model as compared to other Yolo versions in the below image.

You can read the official research paper to find out more about the YoloV7 model’s architecture and performance.

In this article, you will see a step-by-step guide to train a YoloV7 model on a custom dataset in the simplest way possible.

Implementing YoloV7 Model

In this section, you will learn to train a custom object detection model, for which you will see the Drone Detection Dataset from Kaggle. This dataset comprises multiple images containing drones and annotation files (in txt format) that contain object annotations for the object detection task. But wait, what if you have images for your use case and want to create the custom annotation on your own? Don’t worry; we have got you here. You will also see how to create custom annotations and train the YoloV7 model on that dataset.

For the image annotation, you can use the LabelImg tool, while Python 3.9 and Google Colab have been used for the development. The entire code that you are going to develop in this article is based on the official YoloV7 Repository.

Step 0: Creating a Virtual Environment — Optional

While working on machine learning and deep learning, creating a virtual environment is a good practice. This helps you create an isolated environment for each project you work on. Although it is not a compulsive step doing so will help you to efficiently work on any project.

Note: This step is only required when working on a local or a cloud system; for Google Colab, you can feel free to skip this step.

To create a virtual environment, you can use the virtualenv Python library, which can be installed using PIP (Python Package Manager).

$ pip install virtualenv

Once the package is installed, you can go ahead and create a virtual environment for training the model as follows:

$ virtualenv yolov7_training_env

Every time you train a YoloV7 model you need to first activate this virtual environment and start working on it.
Unix:

$ source yolov7_training_env/bin/activate

Windows:

$ yolov7_training_env/Scripts/activate

Source:Author

You are ready to start working on the YoloV7 model or multiple models if you want.

Step 1: Clone Repository and Download Requirements

To begin with, you need to clone the official YoloV7 repository as follows:

$ git clone https://github.com/WongKinYiu/yolov7.git

Note: If you do not have Git installed in your system, then you can download and install it from here and then run the above command, or you can download the code in zip format from here.

Once the cloning is done, you must install different library dependencies to run the entire project. In this repository, you will see a requirements.txt file that contains all the library dependencies. To install these requirements, you need to traverse to the cloned YoloV7 folder through the terminal and write the following command:

$ pip install -r requirements.txt

Note: If working on Google Colab, you may have all the dependencies already installed, so you can check them using the pip freeze command and if needed you can install them with the above command.

Step 2: Create Data Annotations using LabelImg Tool

Now you need to proceed with the data preparation part, which is image annotation for object detection use cases. Image annotation is the process of creating the bounding boxes (Geometric shapes like rectangles, squares, Polygon, etc.) over the objects or ROI (Region of Interest) we want to detect in an image. Several tools are available online for annotating images for object detection tasks, and LabelImg is one of them. This open-source tool is quite easy to use and already has the built-in capacity to create annotations for any version of the Yolo model. This tool can be downloaded using PIP in your system:

$ pip install labelling

Once LabelImg is installed; you can start this tool just by running the labelimg command.

$ labelling

This would open the tool, which would look something like this:

You must load all the images you want to annotate in this window using the open dir option on the left side. Once the images are loaded, you need to ensure that the annotation type is set to Yolo, and then you can start working on the annotation by clicking on the Create Rectbox button. You need to select the object in an image on which you want to train the model and give a name to your annotation (drone for this article).

After annotating the required object in each image, you need to click on save to save the annotations in the txt file. In the end, there would be txt files generated for each image that will contain the annotations, and this is it; this is all you want from the object detection dataset.

Did you know that Comet integrates directly with YoloV5? Learn more in our documentation.

Step 3: Data Preprocessing

Now that you know how to create the data for any use case, let’s jump to the data we currently have from Kaggle. Some of the files in this dataset do not contain annotations; we need to remove these files and images. For this you can create a Jupyter-Notebook with any name eg. Drone_Detection_Data_Preparation.ipynb. We start by importing all the necessary libraries and reading all the annotation files (txt extension).

# import dependencies 
import os 
import glob
import shutil

# read all the annotation files 
txt_files = glob.glob("dataset/*.txt")

Then you need to create two different folders, one for storing the images and another for storing the image labels.

# create a parent folder 
if os.path.exists('Data/'):
    pass 
else:
    os.mkdir('Data/')
    
# image files
if os.path.exists('Data/Images/'):
    pass 
else:
    os.mkdir('Data/Images/')
    
# txt files
if os.path.exists('Data/Lables/'):
    pass 
else:
    os.mkdir('Data/Labels/')

Finally, you need to iterate over each label file (txt file) and check whether coordinates are present. If coordinates are there, you need to copy the image file associated with those coordinates to the Data/Images/ folder and labels to the Data/Labels/ folder. You can do so as follows:

for file in txt_files:
    filename = file.split('/')[1].split('.')[0]
    image_name = filename+'.jpeg'
    
    with open(file, 'r') as f:
        label_file = f.read()
    
    if len(label_file)>0:
        img_src = 'dataset/' + image_name
        img_dst = 'Data/Images/' + image_name
        
        label_src = file
        label_dst = 'Data/Labels/' + filename + '.txt'
        
        shutil.copy(img_src, img_dst)
        shutil.copy(label_src, label_dst)

For working on any ML or DL use cases, you need to have three sets of data: Train, Validation and Test. Train dataset is used for training, validation one is used for validating the results and readjusting the weights of DL-model, and finally, testing datasets tell the real performance of the data. For this, you need to first create three different folders as follows:

# images and their label files
images = glob.glob("Data/Images/*.jpeg")
labels = glob.glob("Data/Labels/*.txt")

# create train folder 
if os.path.exists('Data/Train/'):
    pass 
else:
    os.mkdir('Data/Train/')
    os.mkdir('Data/Train/images')
    os.mkdir('Data/Train/labels')
    
# create validation folder
if os.path.exists('Data/Val/'):
    pass 
else:
    os.mkdir('Data/Val/')
    os.mkdir('Data/Val/images')
    os.mkdir('Data/Val/labels')
    
# create test folder
if os.path.exists('Data/Test/'):
    pass 
else:
    os.mkdir('Data/Test/')
    os.mkdir('Data/Test/images')
    os.mkdir('Data/Test/labels')

We would use 70% of the overall data for training, 20% for validation and 10% for testing.

train_labels = labels[0:int((len(labels) * 70)/100)]
val_labels = labels[int((len(labels) * 70)/100):int((len(labels) * 90)/100)]
test_labels = labels[int((len(labels) * 90)/100):]

Finally, you need to iterate over all the labels and images and place them into respective folders.

# copy images to train, val and test folders

for label in train_labels:
    filename = label.split('/')[-1].split('.')[0]
    
    # images source and destination path 
    img_src = 'Data/Images/'+filename+'.jpeg'
    img_dst = 'Data/Train/images/'+filename+'.jpeg'
    # labels source and destination path
    label_src = label
    label_dst = 'Data/Train/labels/'+filename+'.txt'
    
    shutil.copy(img_src, img_dst)
    shutil.copy(label_src, label_dst)
    
for label in val_labels:
    filename = label.split('/')[-1].split('.')[0]
    
    # images source and destination path
    img_src = 'Data/Images/'+filename+'.jpeg'
    img_dst = 'Data/Val/images/'+filename+'.jpeg'
    # labels source and destination path
    label_src = label
    label_dst = 'Data/Val/labels/'+filename+'.txt'
    
    shutil.copy(img_src, img_dst)
    shutil.copy(label_src, label_dst)
    
for label in test_labels:
    filename = label.split('/')[-1].split('.')[0]
    
    # images source and destination path
    img_src = 'Data/Images/'+filename+'.jpeg'
    img_dst = 'Data/Test/images/'+filename+'.jpeg'
    # labels source and destination path
    label_src = label
    label_dst = 'Data/Test/labels/'+filename+'.txt'
    
    shutil.copy(img_src, img_dst)
    shutil.copy(label_src, label_dst)

This is it; you have completed all the necessary steps for data preprocessing. You only need to copy these Train, Val and Test folders inside the YoloV7/coco folder.

Note: Make sure you follow the same folder structure for any custom dataset.

Step 4: Editing Config Files

When using the Yolo model or any object detection model, you mostly change the configuration files for training the models on custom data. You may not have enough data and resources to train these models from scratch, so you use transfer learning instead. This enables models to apply what they learned earlier to your custom data without spending much of your time and resources.

Now you need to open the YoloV7/data/coco.yaml file, and delete the first four lines. These lines are for downloading the original COCO dataset you don’t want, as you would be training the model on your custom data. Once deleted, you need to give the path to train and val datasets, the number of classes (objects that you have annotated) and the names of those objects. In this article, you are training a drone detection model, so there will be only one class, Drone. The content of coco.yaml should look something like this:

# train and val data as 1) directory: path/images/, 2) file: path/images.txt, or 3) list: [path1/images/, path2/images/]
train: ./coco/Train
val: ./coco/Val

# number of classes
nc: 1

# class names
names: ['Drone']

There is one more file you need to change; it is the model configuration file YoloV7/cfg/training/yolov7.yaml. In this file, you only need to change the variable nc, which stands for the number of classes. For drone detection use case number of the class should be 1.

# parameters
nc: 1  # number of classes
depth_multiple: 1.0  # model depth multiple
width_multiple: 1.0  # layer channel multiple
# anchors
anchors:
  - [12,16, 19,36, 40,28]  # P3/8
  - [36,75, 76,55, 72,146]  # P4/16
  - [142,110, 192,243, 459,401]  # P5/32
# yolov7 backbone
...

These are all the changes that are required in the model configuration.

Note: You will see multiple files for the Yolo model configuration, for example, yolov7.yaml, yolov7-w6.yaml, yolov7-p6.yaml, etc. Depending on which model configuration you are using, you need to make changes to that file.

Step 5: Downloading Pre-trained Weights

Now that you have your data and model configuration ready, this is time to download the pre-trained weights for the YoloV7 model. One thing to remember here is there are multiple variations to the YoloV7 model, whose accuracy and execution speed vary, so depending on which one best suits you, you need to download the weights of that model. Just head over to this link and download the weights of the selected model. Once downloaded, place this weights file (e.g. YoloV7.pt) inside the weights/ folder (you need to create this folder first). For this article, you need to download the Yolov7.pt weights file.

Step 6: YoloV7 Training

Now, if you want to use Google Colab for model training, you need to transfer the whole setup to your Google Drive. Once done, you need to open the Google Colab and set the runtime to GPU.

Runtime > Change runtime type > GPU.

Now you need to connect your Colab notebook to Google Drive and this can be done using the following code:

# connect to google drive
from google.colab import drive
drive.mount('/content/drive')

Once your Google Drive is connected, you need to head over to the YoloV7 folder where all your model and data reside.

# go to the Yolo model folder
%cd drive/MyDrive/Projects/Drone_Detection/YoloV7

Now you are ready to run the model; this can be done with the following command:

!python train.py --device 0 --batch-size 16 --epochs 25 --img 640 640 --data data/coco.yaml --hyp data/hyp.scratch.custom.yaml --name yolov7-custom --weights weights/yolov7.pt

Command Explanation:
train.py: Python file that contains training code.
device: This variable is used to specify if the model should run on CPU or GPU. The default value of it is cpu; for Google Colab, you can assign it 0, representing GPU.
batch-size: Number of images in a batch.
epochs: Number of iterations for model training.
img: Image size, 640 for YoloV7 and YoloV7-X and 1280 for the rest of them.
data: Path to dataset configuration file i.e. coco.yaml.
hyp: List of hyperparameters to use for model training.
name: Name of the folder under which your final weights files will be stored.
weights: Path to pre-trained weights file.

Once you run the above command, the model training will start, and at the end, two files, last.pt and best.pt representing last epoch weights and best weights, would be stored inside the YoloV7/runs/train folder.

Step 7: YoloV7 Inference

Finally, you came all the way to test the model on the images it had never seen before. You must first copy the best.pt file and paste it inside the weights/ folder and run the following command:

# testing the model
!python detect.py --weights weights/best.pt --conf 0.25 --img-size 640 --source cocoa/Test/images/ --no-trace

Command Explanation:
detect.py: Python file that contains model inference code.
weights: Path to weights file to use. For you, it will be weights/best.pt.
conf: Confidence threshold to use for object detection.
img-size: Image size for inference.
source: Path to test images folder or a single test image.
no-trace: Do not trace the model.

Once you run the above command, there will be a new folder created YoloV7/runs/detect/exp/. This folder will contain all the images where drone detection will be done.

That’s it, you have now trained your own custom YoloV7 model for drone detection! The entire code can be found here.

Conclusion

After reading this article, you know how to train your own YoloV7 model on custom data. You can use this approach to train YoloV7 on any kind of object detection data. For example you can use it for face detection, vehicle detection, object detection in industrial automation, text detection, etc.

YoloV7 is recommended for real-time and large-scale object detection when you have limited computational resources and complex scenes with multiple objects. There are some alternatives to YoloV7 model as well that include Faster R-CNN, RetinaNet, SSD (Single Shot Detector), and Mask R-CNN. The choice of algorithm will depend on the specific requirements of your application, such as detection speed, accuracy, and object size.

You can follow me on LinkedIn or Twitter if you have any questions.

Editor’s Note: Heartbeat is a contributor-driven online publication and community dedicated to providing premier educational resources for data science, machine learning, and deep learning practitioners. We’re committed to supporting and inspiring developers and engineers from all walks of life.

Editorially independent, Heartbeat is sponsored and published by Comet, an MLOps platform that enables data scientists & ML teams to track, compare, explain, & optimize their experiments. We pay our contributors, and we don’t sell ads.

If you’d like to contribute, head on over to our call for contributors. You can also sign up to receive our weekly newsletter (Deep Learning Weekly), check out the Comet blog, join us on Slack, and follow Comet on Twitter and LinkedIn for resources, events, and much more that will help you build better ML models, faster.