Benchmarking Computer Vision Models using PyTorch & Comet

Published in

Heartbeat

6 min readJul 17, 2023

Transfer learning using pre-trained computer vision models has become essential in modern computer vision applications. It involves customizing a pre-trained model to work with a new set of data and tasks. This process requires careful monitoring and bench-marking to ensure the model performs optimally. In this context, PyTorch provides a user-friendly interface for loading pre-trained models, modifying them, and fine-tuning them for specific tasks.

Additionally, monitoring the performance of the fine-tuned models is crucial to ensure their quality and identify potential improvement areas. Comet, a cloud-based machine learning platform, offers a powerful solution for tracking, comparing, and benchmarking fine-tuned models, allowing users to easily analyze and visualize their performance. In this article, we will explore the process of fine-tuning computer vision models using PyTorch and monitoring the results using Comet.

Prerequisites

To follow along with this tutorial, make sure you:

Use a Google Colab Notebook to follow along
Install these Python packages using pip: CometML , PyTorch, TorchVision, Torchmetrics and Numpy, Kaggle

%pip install - upgrade comet_ml>=3.10.0 
!pip install kaggle torchmetrics numpy torchvision torch

Import the following packages in your notebook.

Make sure that you import Comet library before PyTorch to benefit from auto logging features

Choosing Models for Classification

When it comes to choosing a computer vision model for a classification task, there are several factors to consider, such as accuracy, speed, and model size. Pre-trained models, such as VGG, ResNet. squeezenet, alexnet, densenet, or Inception are widely available and have achieved state-of-the-art results on various benchmark datasets. However, choosing the right pre-trained model can be challenging as different models excel in different tasks and datasets.

For the purpose of this tutorial, you will benchmark only two models: VGG and ResNet. However, the lessons you’ll learn from this tutorial will help you benchmark more computer vision models.

Evaluation Metrics

Choosing the right evaluation metrics for a classification task is critical to accurately benchmark the performance of computer vision models. Common metrics for classification tasks include accuracy, precision, recall, F1 score, and confusion matrix. Comet allows ML engineers to track these metrics in real-time and visualize their performance using interactive dashboards. Moreover, Comet provides advanced features such as hyperparameter optimization and experiment comparison, making it easier to fine-tune the models and identify the best-performing model.

Below, initialize the Comet ML library, create your experiment, define some parameters, and log them to Comet

comet_ml.init()
experiment = comet_ml.Experiment(
    project_name="Benchmarking-classifiers", 
)

model_names = ["resnet", "vgg"]
hyper_params = {"batch_size": 20, "num_epochs": 15, "learning_rate": 0.01}
num_classes = 2
input_size = 224p

experiment.log_parameters(hyper_params)

Prompt engineering plus Comet plus Gradio? What comes out is amazing AI-generated art! Take a closer look at our public logging project to see some of the amazing creations that have come out of this fun experiment.

Data Preparation

You will use the Ants and Bees classification dataset available on Kaggle. To download it, you will use the Kaggle package. Create your API keys on your Account’s Settings page and it will download a JSON file. Open it, copy the username and key, and set the environment variables as shown below. This will allow you to download the dataset

%env KAGGLE_USERNAME=<YOUR_USERNAME>
%env KAGGLE_KEY=<YOUR_KEY>
!kaggle datasets download gauravduttakiit/ants-bees --unzip
data_dir = "./hymenoptera_data"

You can proceed to create data augmentation pipelines using PyTorch

data_transforms = {
    'train': transforms.Compose([
        transforms.RandomResizedCrop(input_size),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
    'val': transforms.Compose([
        transforms.Resize(input_size),
        transforms.CenterCrop(input_size),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
}

Using the transformation pipelines, create data loaders for your training and validation data sets

# Create training and validation datasets
image_datasets = {x: datasets.ImageFolder(os.path.join(data_dir, x), data_transforms[x]) for x in ['train', 'val']}
# Create training and validation dataloaders
dataloaders_dict = {x: torch.utils.data.DataLoader(image_datasets[x], batch_size=hyper_params["batch_size"], shuffle=True, num_workers=4) for x in ['train', 'val']}

# Detect if we have a GPU available
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

Transfer Learning

In this section, you will create two main functions. One for running the training loop of a model and another one for initializing the two models described above (VGG and ResNet). The training function will work for both training and validation loops, collecting and logging key data metrics to Comet

def train_model(model, model_name, dataloaders, criterion, optimizer, num_epochs=25):
    for epoch in range(num_epochs):
        recall = BinaryRecall()
        precision = BinaryPrecision()
        f1 = BinaryF1Score()
        print('Epoch {}/{}'.format(epoch, num_epochs - 1))
        print('-' * 10)

        # Each epoch has a training and validation phase
        for phase in ['train', 'val']:
            if phase == 'train':
                model.train()
            else:
                model.eval()

            batch_loss = 0.0
            batch_correct = 0

            # Iterate over data.            
            for batch_idx, (inputs, labels) in tqdm(enumerate(dataloaders[phase])):
                inputs = inputs.to(device)
                labels = labels.to(device)

                optimizer.zero_grad()                
                with torch.set_grad_enabled(phase == 'train'):
                    outputs = model(inputs)
                    loss = criterion(outputs, labels)
                    _, preds = torch.max(outputs, 1)

                    if phase == 'train':
                        loss.backward()
                        optimizer.step()

                batch_loss += loss.item() * inputs.size(0)
                batch_correct += torch.sum(preds == labels.data)

                experiment.log_metric(f"{phase}_batch_acc_{model_name}", batch_correct / labels.size(0))
                experiment.log_confusion_matrix(labels.data, preds, title=f"cm_{model_name}")
                recall(preds, labels.data)
                precision(preds, labels.data)
                f1(preds, labels.data)

            epoch_loss = batch_loss / len(dataloaders[phase].dataset)
            epoch_acc = batch_correct.double() / len(dataloaders[phase].dataset)

            experiment.log_metrics({
                f"{phase}_acc_{model_name}": epoch_acc, 
                f"{phase}_loss_{model_name}": epoch_loss
            }, epoch=epoch)

            # experiment.log_curve(f"{phase}_pr-curve-{model}", recall.compute(), precision.compute())
            experiment.log_metric(f"{phase}-f1-{model_name}", f1.compute())
            
            print('{} Loss: {:.4f} Acc: {:.4f}'.format(phase, epoch_loss, epoch_acc))
    return model

Next, you need to initialize the two models discussed above. In transfer learning, you need to freeze the last layer and replace it with your own architecture. This depends on the structure of your data. For this use case, you need to output the probability of the result being in one of 2 classes (Is it an ant or a bee?).

def set_parameter_requires_grad(model):
    for param in model.parameters():
      param.requires_grad = False

def initialize_model(model_name, num_classes, use_pretrained=True):
    model_ft = None

    if model_name == "resnet":
        model_ft = models.resnet18(pretrained=use_pretrained)
        set_parameter_requires_grad(model_ft)
        num_ftrs = model_ft.fc.in_features
        model_ft.fc = nn.Linear(num_ftrs, num_classes)

    elif model_name == "vgg":
        model_ft = models.vgg11_bn(pretrained=use_pretrained)
        set_parameter_requires_grad(model_ft)
        num_ftrs = model_ft.classifier[6].in_features
        model_ft.classifier[6] = nn.Linear(num_ftrs,num_classes)

    else:
        print("Invalid model name, exiting...")
        exit()

    return model_ft

Finally, you need to create your optimization function. Since it’s a binary classification task, CrossEntropy loss is perfect as it outputs a probability between 0 and 1. Run the training process and monitor the metrics being logged in Comet for each model.

for model_name in model_names:
  # Initialize the model for this run
  model_ft = initialize_model(model_name, num_classes, use_pretrained=True)
  model_ft = model_ft.to(device)
  params_to_update = model_ft.parameters()
  optimizer_ft = optim.SGD(params_to_update, lr=hyper_params["learning_rate"], momentum=0.9)
  criterion = nn.CrossEntropyLoss()
  model_ft = train_model(model_ft, model_name, dataloaders_dict, criterion, optimizer_ft, num_epochs=hyper_params["num_epochs"])p

Evaluation

You can create panels on Comet that can combine various metrics collected and visualize them. For example, the image below compares accuracy and loss values for both ResNet and VGG and you can note that VGG generalizes faster with fewer training steps

Also, you can view the confusion matrix, a vital tool that can help you debug classification models. The diagram below displays that we have high occurrences of True Positives & True Negatives. False Positives and False Negatives are super low, which is a good indicator.

Conclusion

Congratulations on finishing this tutorial on how to benchmark computer vision models. You’ve learned how to load datasets and create data loaders needed for training PyTorch models. You learned various computer vision models, how to perform transfer learning and calculate metrics that are crucial for evaluating and monitoring the success of the models. Even though we only used VGG and ResNet, you can apply this knowledge to evaluate other computer vision models.

Editor’s Note: Heartbeat is a contributor-driven online publication and community dedicated to providing premier educational resources for data science, machine learning, and deep learning practitioners. We’re committed to supporting and inspiring developers and engineers from all walks of life.

Editorially independent, Heartbeat is sponsored and published by Comet, an MLOps platform that enables data scientists & ML teams to track, compare, explain, & optimize their experiments. We pay our contributors, and we don’t sell ads.

If you’d like to contribute, head on over to our call for contributors. You can also sign up to receive our weekly newsletter (Deep Learning Weekly), check out the Comet blog, join us on Slack, and follow Comet on Twitter and LinkedIn for resources, events, and much more that will help you build better ML models, faster.