End-to-End Deep Learning Project with PyTorch & Comet ML

A complete guide to building a deep learning project with PyTorch, tracking an Experiment with Comet ML, and deploying an app with Gradio on HuggingFace

Published in

Heartbeat

13 min readMar 28, 2023

AI tools such as ChatGPT, DALL-E, and Midjourney are increasingly becoming a part of our daily lives. These tools were developed with deep learning techniques. Deep learning is a subfield of AI that aims to extract knowledge from data. Today, I’ll walk you through how to perform an end-to-end deep learning project using PyTorch, Comet ML, and Gradio. Here are the topics we’ll cover in this blog:

Loading a custom dataset with PyTorch
Building a CNN-based model from scratch
Tracking an experiment with Comet ML
Deploying an app on HuggingFace with Gradio

By the end of this article, you’ll learn step-by-step the life cycle of a deep learning project on how to perform image classification using the cat vs. dog dataset. After finishing the project, our app will look like this:

Image Classification App (Video by Author)

Introduction

Before diving into the project, let me explain the libraries I’m going to use in this analysis. Let’s start with PyTorch:

PyTorch

Two frameworks are generally used for deep learning: TensorFlow and PyTorch. TensorFlow is mostly used in industry, while PyTorch is used for academic research. I’m going to use PyTorch for this project because of its user-friendliness, flexibility, and robust community support. Let’s move on and take a look at another library.

Comet ML

When implementing deep learning projects, you’ll need to track your hyperparameters, visualize performance metrics, monitor models, and share experiments with others. This is when Comet ML comes into play.

Comet ML is an machine learning platform that allows you to manage, visualize, compare and optimize models. We’re going to use Comet ML to track our hyperparameters and to monitor our model. Let’s move on and have a look at what Gradio is.

Gradio

Deep learning projects that are not moved to production are dead projects. Gradio is an open-source Python library that helps you build easy-to-use demos for your ML model that you can share with other people.

Beautiful! We briefly talked about the libraries we’ll use. Let’s go ahead and start loading our dataset.

Loading the Dataset

Beginners often start with clean datasets like the MNIST dataset to learn deep learning. It’s good to start with these datasets, but real-world datasets aren’t always clean. One of the challenges in deep learning is loading and working with a custom dataset. The dataset we’ll use is the cat and dog dataset, which contains images of cats and dogs.

Before loading this dataset, let’s launch an Experiment in Comet ML..

# Installing comet_ml
# !pip install comet_ml

# Importing the comet_ml library
import comet_ml
from comet_ml import Experiment

# Building an experiment with your API key
experiment = Experiment(
    api_key= my_api_key,
    workspace="tirendaz-academy",
    project_name="experiment-tracking")

# Setting hyperparameters
hyper_params = {"seed": 42, "batch_size": 32, "num_epochs": 20, 
                "learning_rate": 1e-3,"image_size": 224}

# Logging hyperparamters
experiment.log_parameters(hyper_params)

If you don’t have a Comet ML account yet, you can create a free account here. To follow along with the code in this blog, you can access the notebook I used in this project here.

Awesome! We started our experiment. Now let's take a look at our version of Torch and check if we have access to CUDA (GPU). CUDA is a parallel computing platform developed by NVIDIA that makes calculations faster. You can use it for free (in limited amounts) on Google Colab or Kaggle notebooks.

import torch
from torch import nn

# Make sure torch >= 1.10.0
print("The version of torch:", torch.__version__)

# Setuping cuda
device = "cuda" if torch.cuda.is_available() else "cpu"
print("The type of device: ",device)

# Output:
The version of torch: 1.11.0
The type of device:  cuda

Now let’s create our training and test paths that we’ll use while loading data.

# Creating our paths
my_train_dir = "/kaggle/input/cat-and-dog/training_set/training_set"
my_test_dir = "/kaggle/input/cat-and-dog/test_set/test_set"

Excellent! Now let’s get a random image and look at the features of this image.

import random
from PIL import Image
import glob
from pathlib import Path

# Setting seed
random.seed(hyper_params["seed"]) 

# Creating our image path
image_path= glob.glob(f"{image_path}/*/*/*/*.jpg")

# Getting random a path
random_image_path = random.choice(image_path)

# Creating a variable for the path 
image_class = Path(random_image_path).parent.stem

# Let's open the image
image = Image.open(random_image_path)

# Let's print our metadata
print("Random image path: {}".format(random_image_path))
print("Image class: {}".format(image_class))
print("Image height: {}".format(image.height)) 
print("Image width: {}".format(image.width))
image

This is a cat image and the size of this image is 217*179. What a cute cat, right? I love cats. Let’s go ahead and create the necessary functions to load the dataset.

Transforming data

So far, we created variables for the dataset paths and explored an image from the dataset. Note that images in a dataset may not all be the same size. In this case, we’ll need to do some data preprocessing to standardize the size, shape, and format of th stylee pictures.

Transforming data, also known as preprocessing, helps you prepare quality data. With transforming, you can improve the performance of your model and reduce the risk of bias. Let’s transform our dataset with torchvision.transforms in PyTorch.

from torchvision import transforms

# Setting our image size
IMAGE_SIZE=(hyper_params["image_size"], hyper_params["image_size"])

# Creating a transform for training using TrivialAugment
my_train_transform = transforms.Compose([
    transforms.Resize(IMAGE_SIZE),
    transforms.TrivialAugmentWide(),
    transforms.ToTensor()])

# Creating a transform for testing 
my_test_transform = transforms.Compose([
    transforms.Resize(IMAGE_SIZE),
    transforms.ToTensor()])

Here, we used the TrivialAugmentWide function, which is a data augmentation technique. Now let’s take a step back and talk about what data augmentation is. Data augmentation is a method used to artificially increase the diversity of your data by modifying your existing data. This technique is often utilized when the dataset is small. I encourage you to examine the examples of the various transforms here.

Nice, we determined how to transform our dataset. Now we’re ready to create our own custom torch Dataset class.

Creating a Custom `Dataset`

You can find many ready-made datasets such as MNIST, and CIFAR100 in the torchvision.datasets module. But in most cases, you need to handle real-world datasets. If you want, you can create your own class to load the dataset in PyTorch. But, the good news is that you can use the ImageFolder function if the format of your dataset is as shown below:

We can use this function because the format of our dataset is as shown above. Let's load images from train and test folders into Datasets with the ImageFolder function.

# Converting our image folders ito Datasets
from torchvision import datasets

# Converting our image folders ito Datasets
my_train_data = datasets.ImageFolder(my_train_dir, transform=my_train_transform)
my_test_data = datasets.ImageFolder(my_test_dir, transform=my_test_transform)

Note that PyTorch has two great functions for loading the dataset: Dataset and DataLoader. The samples and their related labels are stored in Dataset. DataLoader iteratively wraps the Dataset for easy access to samples. We built our custom Datasets. It’s time to turn our custom Datasets into DataLoaders. Show time!

from torch.utils.data import DataLoader

# Setting some parameters
torch.manual_seed(hyper_params["seed"])
NUM_WORKERS = os.cpu_count()

# Creating a training DataLoader
my_train_dataloader = DataLoader(my_train_data,
                                 batch_size=hyper_params["batch_size"], 
                                 shuffle=True,
                                 num_workers=NUM_WORKERS)

# Creating a test DataLoader
my_test_dataloader = DataLoader(my_test_data,
                               batch_size=hyper_params["batch_size"], 
                               shuffle=False, 
                               num_workers=NUM_WORKERS)

Awesome! We have prepared the necessary functions to load the dataset. We are now ready to build a CNN-based model.

Want to see the evolution of AI-generated art projects? Visit our public project to see time-lapses, experiment evolutions, and more!

Model Building

Convolutional neural network (CNN) is a deep learning technique often used to extract patterns in visual data. CNN consists of at least three layer types: the convolutional layer, the pooling layer and then the fully connected layer.

Various filters are used in the convolution layer (Source)

The convolutional layer is the fundamental building block of a CNN that is used to extract information such as edges from images. The pooling layer is added between the successive convolution layers and is leveraged to reduce the number of parameters. Images pass through the convolution and pooling layers, and then classification occurs in the final fully connected layer.

You can build a CNN-based model with transfer learning. But I’m going to create a CNN model from scratch:

# Creating a CNN-based image classifier.
class ImageClassifier(nn.Module):
    def __init__(self):
        super().__init__()
        
        # Creating our first convolutional layer
        self.conv_layer_1 = nn.Sequential(
          nn.Conv2d(3, 64, 3, padding=1),
          nn.ReLU(),
          nn.BatchNorm2d(64),
          nn.MaxPool2d(2))
         
        # Creating our second convolutional layer
        self.conv_layer_2 = nn.Sequential(
          nn.Conv2d(64, 512, 3, padding=1),
          nn.ReLU(),
          nn.BatchNorm2d(512),
          nn.MaxPool2d(2))

        # Creating our third convolutional layer
        self.conv_layer_3 = nn.Sequential(
          nn.Conv2d(512, 512, kernel_size=3, padding=1),
          nn.ReLU(),
          nn.BatchNorm2d(512),
          nn.MaxPool2d(2)) 

        # Creating our classifier
        self.classifier = nn.Sequential(
          nn.Flatten(),
          nn.Linear(in_features=512*3*3, out_features=2))

    # Defining the forward function to pass data
    def forward(self, x: torch.Tensor):
        x = self.conv_layer_1(x)
        x = self.conv_layer_2(x)
        x = self.conv_layer_3(x)
        x = self.conv_layer_3(x)
        x = self.conv_layer_3(x)
        x = self.conv_layer_3(x)
        x = self.classifier(x)
        return x

# Instantiating an object
my_model = ImageClassifier().to(device)

Here, we first defined the hyperparameters and then used these hyperparameters in the forward function. After building the model architecture, we instantiated an object from the ImageClassifier class.

Nice, we created a CNN-based model. Note that we used some hyperparameters such as filter size, number of neurons, and activation function. You can fine-tune these hyperparameters to build better models.

Let’s see the architecture of the model with torchinfo and then pass an image from this architecture for control.

# Installing torchinfo
# !pip install torchinfo

import torchinfo
from torchinfo import summary

# Testing with an example input size 
summary(my_model, input_size=[1, 3, hyper_params["image_size"] ,hyper_params["image_size"]])

The output torchinfo.summary shows all the information about our model, such as input size, the total size of parameters, and the estimated total size.

Awesome, our model worked without errors. Let’s move on to creating the train and test steps. Note that we build the model using the training data and evaluate the performance of the model on the test data. First, let’s create a function that we’ll use to train the model.

def my_train_step(model: torch.nn.Module, 
                 dataloader: torch.utils.data.DataLoader, 
                 loss_fn: torch.nn.Module, 
                 optimizer: torch.optim.Optimizer):
    
    # Setting train mode
    my_model.train()
    
    # Initializing train loss & train accuracy values
    train_loss = 0
    train_acc = 0
    
    # Looping through each batch data in the dataloader
    for batch, (inp, out) in enumerate(dataloader):
        
        # Moving data to device
        inp, out = inp.to(device), out.to(device)
        
        # Predicting the input
        y_pred = my_model(inp)

        # Calculating & accumulating loss
        loss = loss_fn(y_pred, out)
        train_loss += loss.item() 

        # Optimizer zero grad
        optimizer.zero_grad()

        # Loss backward
        loss.backward()

        # Optimizer step
        optimizer.step()

        # Calculating & accumulating the accuracy metric
        y_pred_label = torch.argmax(torch.softmax(y_pred, dim=1), dim=1)
        train_acc += (y_pred_label == out).sum().item()/len(y_pred)

    # Calculating metrics 
    train_loss = train_loss / len(dataloader)
    train_acc = train_acc / len(dataloader)
    
    # Logging train metrics
    experiment.log_metrics({"train_accuracy": train_acc, "train_loss": train_loss}, epoch=hyper_params['num_epochs'])
    
    return train_loss, train_acc

Here, we used the experiment.log_metrics function to track the model metrics. Note that if you haven’t instantiated a Comet Experiment (as we did at the very beginning of this article), this step will throw an error. So, we can monitor these metrics on the Comet ML dashboard while the model is being trained.

Beautiful, we created a function for the training step. Let’s go ahead and similarly create the function that we’ll use to test the model.

def my_test_step(model: torch.nn.Module, 
                dataloader: torch.utils.data.DataLoader, 
                loss_fn: torch.nn.Module):
    
    # Setting eval mode
    my_model.eval() 
    
    # Initializing test loss & test accuracy values
    test_loss = 0
    test_acc = 0
    
    # Starting the inference mode
    with torch.inference_mode():
        
        # Looping through each batch data in the dataloader
        for batch, (inp, out) in enumerate(dataloader):
            
            # Moving data to device
            inp, out = inp.to(device), out.to(device)
    
            # Forward pass
            test_pred_logits = my_model(inp)

            # Calculating & accumulating loss
            loss = loss_fn(test_pred_logits, out)
            test_loss += loss.item()
            
            # Calculating & accumulating the accuracy metric
            test_pred_labels = test_pred_logits.argmax(dim=1)
            test_acc += ((test_pred_labels == out).sum().item()/len(test_pred_labels))
            
    # Calculating metrics
    test_loss = test_loss / len(dataloader)
    test_acc = test_acc / len(dataloader)
    
    # Logging test metrics
    experiment.log_metrics({"test_accuracy": test_acc, "test_loss": test_loss}, epoch=hyper_params['num_epochs'])
    
    return test_loss, test_acc

Here, we used model.eval mode, as we’ll only use this function to evaluate the model. Now let’s define a function named train to combine the train_step and test_step functions.

from tqdm.auto import tqdm

# Setting parameters 
def my_train(model: torch.nn.Module, 
            train_dataloader: torch.utils.data.DataLoader, 
            test_dataloader: torch.utils.data.DataLoader, 
            optimizer: torch.optim.Optimizer,
            loss_fn: torch.nn.Module = nn.CrossEntropyLoss(),
            epochs: int = 5):
    
    # Creating a variable for metrics
    my_results = {"train_loss": [],
                  "train_acc": [],
                  "test_loss": [],
                  "test_acc": []}
    
    # Looping for training and testing steps 
    for epoch in tqdm(range(epochs)):
        train_loss, train_acc = my_train_step(model=my_model,
                                              dataloader=my_train_dataloader,
                                              loss_fn=my_loss_fn,
                                              optimizer=my_optimizer)
        test_loss, test_acc = my_test_step(model=my_model,
                                           dataloader=my_test_dataloader,
                                           loss_fn=my_loss_fn)
        
        # Printing results
        print(
            f"Epoch: {epoch+1} | "
            f"train_loss: {train_loss:.4f} | "
            f"train_acc: {train_acc:.4f} | "
            f"test_loss: {test_loss:.4f} | "
            f"test_acc: {test_acc:.4f}")

        # Updating results
        my_results["train_loss"].append(train_loss)
        my_results["train_acc"].append(train_acc)
        my_results["test_loss"].append(test_loss)
        my_results["test_acc"].append(test_acc)

    # Returning results at the end of the epochs
    return my_results

Awesome, we have created our functions to train and test the model. Now we can start model training.

Model Training

So far, we have created training and testing steps, and then we have prepared a function to combine these steps. We are ready to train the model using these steps. Show time!

# Setting seeds
torch.manual_seed(hyper_params["seed"]) 
torch.cuda.manual_seed(hyper_params["seed"])

# Creating loss function & optimizer
my_loss_fn = nn.CrossEntropyLoss()
my_optimizer = torch.optim.Adam(params=my_model.parameters(), lr=hyper_params["learning_rate"])

# Initializing the timer
from timeit import default_timer as timer 
my_start_time = timer()

# Training our model
my_model_results = my_train(model=my_model,
                            train_dataloader=my_train_dataloader,
                            test_dataloader=my_test_dataloader,
                            optimizer=my_optimizer,
                            loss_fn=my_loss_fn,
                            epochs=hyper_params["num_epochs"])

# Ending the timer 
my_end_time = timer()

# Printing the time
print(f"Total training time: {my_end_time-my_start_time:.3f} seconds")

Nice, our model was trained on the training data and evaluated on the test data. At the end of 20 epochs, the accuracy of our model on the training data is 0.91 and on the test data is 0.91. Note that we want the accuracy of the model on the training and test data to be close to each other so that our model is not prone to overfitting. Now, let’s visualize the accuracy and loss metrics.

def my_plot_loss_curves(results):
  
    my_results = dict(list(my_model_results.items()))

    # Getting the train & test loss values
    my_loss = my_results['train_loss']
    my_test_loss = my_results['test_loss']

    # Getting the train & test accuracy values
    my_accuracy = my_results['train_acc']
    my_test_accuracy = my_results['test_acc']

    # Calculating epochs
    my_epochs = range(len(my_results['train_loss']))

    # Let's setup a graph
    plt.figure(figsize=(15, 7))

    # Let's plot loss
    plt.subplot(1, 2, 1)
    plt.plot(my_epochs, my_loss, label='train_loss')
    plt.plot(my_epochs, my_test_loss, label='test_loss')
    plt.title('Loss')
    plt.xlabel('Epochs')
    plt.legend()

    # Let's plot accuracy
    plt.subplot(1, 2, 2)
    plt.plot(my_epochs, my_accuracy, label='train_accuracy')
    plt.plot(my_epochs, my_test_accuracy, label='test_accuracy')
    plt.title('Accuracy')
    plt.xlabel('Epochs')
    plt.legend();

# Let's plot the results
my_plot_loss_curves(model_results)

Accuracy and loss metrics on training and test sets (Image by Author)

The accuracy values of the model on the training and test data are not bad. Keep in mind that you can achieve better scores by fine-tuning the hyperparameters.

Since you can obtain different versions of models using different hyperparameters, it’s a good idea to track these with Comet. Let me show you.

# Saving our model
from comet_ml.integration.pytorch import log_model
log_model(experiment, my_model, model_name="My_Image_Classification_Model")

Beautiful, we built a CNN-based model and saw the performance of this model. While performing these steps, we logged the hyperparameters and metrics with Comet ML. Finally, we saved the model for versioning and monitoring. Now let’s end our experiment with the following command:

# Ending our experiment
experiment.end()

Experiment Tracking With Comet ML

Time to review the results we found in the Comet ML dashboard. Here are the results:

Building An App With Gradio

Projects that remain in the notebooks are dead projects. The ML lifecycle is an ongoing process from data preparation to deployment and monitoring of the model. With Gradio we can create an app and deploy it on Hugging Face for free. Let’s get started:

# Creating a function for prediction
def predict(inp):
    image_transform = transforms.Compose([ transforms.Resize(size=(224,224)), transforms.ToTensor()])
    labels = ['cat', 'dog']
    inp = image_transform(inp).unsqueeze(dim=0)
    with torch.no_grad():
        prediction = torch.nn.functional.softmax(model(inp))
        confidences = {labels[i]: float(prediction.squeeze()[i]) for i in range(len(labels))}    
    return confidences

# Building an interface
gr.Interface(fn=predict, 
             inputs=gr.Image(type="pil"),
             outputs=gr.Label(num_top_classes=2),
             title=title,
             description=description,
             article=article,
             examples=['cat.jpg', 'dog.jpg']).launch()

Here, we first defined a function to predict the label of the image and then created an interface using this function. You can examine this app and access project files here.

Conclusion

Congratulations! You learned how to perform an end-to-end deep learning project. Note that deep learning projects are a never-ending cycle. You collect the data, train the model with this data, then turn this model into an app, move this app to production, and finally monitor whether this app is working properly and iterate.

In this project, we used PyTorch for model building, Comet ML for experiment tracking, and Gradio to convert the model into an app. You can access the GitHub repo of this project here.

That’s it! Thanks for reading and I hope you enjoyed it. Please let me know if you have any feedback and feel free to connect with me on YouTube | Twitter | Instagram | Linkedin | Kaggle

7 Steps to Become a Machine Learning Engineer

A comprehensive guide with courses and books

heartbeat.comet.ml

Resources

Editor’s Note: Heartbeat is a contributor-driven online publication and community dedicated to providing premier educational resources for data science, machine learning, and deep learning practitioners. We’re committed to supporting and inspiring developers and engineers from all walks of life.

Editorially independent, Heartbeat is sponsored and published by Comet, an MLOps platform that enables data scientists & ML teams to track, compare, explain, & optimize their experiments. We pay our contributors, and we don’t sell ads.

If you’d like to contribute, head on over to our call for contributors. You can also sign up to receive our weekly newsletter (Deep Learning Weekly), check out the Comet blog, join us on Slack, and follow Comet on Twitter and LinkedIn for resources, events, and much more that will help you build better ML models, faster.