Introduction to Human Action Recognition (HAR)

Published in

Heartbeat

6 min readApr 18, 2023

In this article, we will be talking about human action recognition. We will make use of the VGG model or VGGNet, and we will log our model, artifacts, and metrics to Comet.

This is going to be a hands-on tutorial, so I urge you to read and code along, and I will add the link to the code at the end of the article.

What is Human action recognition (HAR)?

Human Action Recognition (HAR) is a process of identifying and categorizing human actions from videos or image sequences. It is a challenging task in computer vision, and it has many practical applications, such as video surveillance, human-computer interaction, sports analysis, and medical diagnosis.

The goal of HAR is to automatically recognize the actions performed by humans based on their movement patterns and appearance features, such as body shape, motion trajectories, and joint angles. HAR systems typically use machine learning algorithms to learn and classify human actions based on the visual features extracted from the input data.

Why is HAR important?

Here are some of the reasons why HAR is important:

Healthcare: HAR can be used to monitor and analyze human movements for physical therapy, rehabilitation, and medical diagnosis.
Surveillance and Security: HAR can be used in video surveillance and security applications to identify and track suspicious behavior or criminal activity.
Sports and Fitness: HAR can be used to track and analyze the movements of athletes and fitness enthusiasts to improve performance and prevent injuries.
Robotics: HAR can be used in robotics to enable machines to recognize and respond to human actions, making them more useful and user-friendly.
Entertainment: HAR can be used in the entertainment industry for gesture recognition in games, virtual reality, and other interactive applications.

The VGG model

The VGG (Visual Geometry Group) model is a deep convolutional neural network architecture for image recognition tasks. It was introduced in 2014 by a group of researchers (A. Zisserman and K. Simonyan) from the University of Oxford.

The VGG model is known for its simplicity and effectiveness in image classification tasks. The model architecture consists of a series of convolutional layers, followed by max-pooling layers and finally, fully connected layers.

VGG Neural Network Architecture — Source

The convolutional layers are responsible for extracting features from the input image, while the fully connected layers are responsible for classifying the image into different categories. The VGG model has achieved state-of-the-art results on several benchmark image recognition datasets with 92.7% top-5 test accuracy in ImageNet.

Import the library

Dataset

This article’s dataset includes 15 distinct categories of human activities.
With each image having just one category for human activity and being saved in a separate folder for each labelled class, it contains about 12k+ labelled images, including the validation images.

From the code above, we import Comet into our project; it is a cloud-based machine learning platform that allows data science teams to track, compare, explain, and optimize their experiments and models across the complete ML lifecycle — from training runs to production monitoring.

Sometimes simple solutions offer the best results. We made minor hardware optimizations for a huge increase in throughput. Check out the project here.

We logged our datasets as Artifacts to Comet using the .log_artifact() method. The reason why we logged the Artifacts is because of reproducibility — Comet Artifacts capture the environment, code, data, and results of a machine learning experiment, making it easier to reproduce the experiment and verify the results. To know more about Artifacts, check it out here.

Here are our logged Artifacts:

We can have a quick look at our dataset by visualizing the different human actions such as sitting, hugging, eating, running etc. We will make use of the Plotly library for the visualization.

Here is the pie chart:

Pie chart showing human actions in the dataset.

From the visualization, we can see the different distribution of human activities, such as sitting, hugging, using laptop, eating, running, etc., within the training dataset. All these different activities have the same distribution percentage of 6.67%.

Feature selection

Selecting the best and most relevant features from the dataset helps us get a more accurate/efficient model. There are two columns in the dataset, filename — which contains the images that will be needed to train the model, and label — which contains the different human actions in our dataset.

Let’s create a function DisplayImage() to pick any random image and display it with its respective label.

To view this, call the function:

Data processing

In this section, the first thing we have to do is to train the model, and fit our model with adequate parameters.

Here is our model summary:

Save model

Let’s save the model and also log accuracy, hyperparameters and loss to Comet.

The vgg_model.save_weights("HumanActionModel.h5") saves the weights of all the layers and the model of the VGG model to a file called HumanActionModel.h5 in the current directory.

A trained model’s weights can be saved so that we can reuse the model later without having to start over from scratch. This can be helpful if we want to use the model for further testing or to deploy it to a production environment.

We can easily register our model in Comet, just head over to the Assets & Artifacts section, click on the saved model and register it.

Before we end this tutorial, one last thing to do is end the experiment.

#end the experiment
experiment.end()

https://imgur.com/gallery/gcFHofm

Conclusion

In conclusion, Human Action Recognition (HAR) is an important task for many applications, including video surveillance, etc. Deep learning architectures called VGG models have attained state-of-the-art performance in various image recognition tasks, including HAR.

We have been able to work hands-on on how to use the VGG model to build a HAR system. The goal of the model is to be able to detect human actions given a picture, and we used the VGGNet-16 model to achieve this goal.

Human Activity Recognition (HAR) is an active research area, and there are several ongoing efforts to improve its accuracy and effectiveness. Some of the research areas that are being explored include Deep Learning, Sensor Fusion, Transfer Learning, Context-aware etc.

Feel free to check out the full notebook here.

Editor’s Note: Heartbeat is a contributor-driven online publication and community dedicated to providing premier educational resources for data science, machine learning, and deep learning practitioners. We’re committed to supporting and inspiring developers and engineers from all walks of life.

Editorially independent, Heartbeat is sponsored and published by Comet, an MLOps platform that enables data scientists & ML teams to track, compare, explain, & optimize their experiments. We pay our contributors, and we don’t sell ads.

If you’d like to contribute, head on over to our call for contributors. You can also sign up to receive our weekly newsletter (Deep Learning Weekly), check out the Comet blog, join us on Slack, and follow Comet on Twitter and LinkedIn for resources, events, and much more that will help you build better ML models, faster.