Image Visualization with Kangas

Applying built-in functions from Kangas UI to Hugging Face DataGrids

Published in

Heartbeat

6 min readFeb 6, 2023

Image from https://unsplash.com/photos/EXgCBYk4wCc

In a previous article we explored the first basic features of the Kangas API to construct our own DataGrids and then visualized them in the Kangas server. For your reference, this is the mentioned tutorial:

Constructing and visualizing Datagrids in Kangas

A comprehensive introductory tutorial on how to create your Datagrids and then manipulate, classify and visualize in…

felixvidalgu.medium.com

This time we will keep exploring some other features included in the Kangas library by importing already-built DataGrids for the purpose of image classification in the Kangas UI. These DataGrids are available for public use in the Hugging Face repository.

What’s Hugging Face

According to its website, Hugging Face is a “platform where Users can build, benchmark, share, version and deploy Repositories, which may include Models, Datasets and Machine Learning Applications.”

All the Hugging Face open-source projects are available on their GitHub page, and they include Transformers, Datasets and Tokenizers.

Hugging Face

The AI community building the future. 🏎️ Accelerate training and inference of 🤗 Transformers with easy to use…

github.com

In order to have access to Hugging Face Datagrids, defined as “Datasets” in their platform, first we will need to install the datasets library in our Python environment, if using conda the recommendation is to create an isolated environment where you will need to install kangas and the datasets library:

pip install kangas

and

pip install datasets

Then we will continue performing all the analysis in a Jupyter Notebook, where we have previously activated the environment, in my particular case I have created a conda env with the name flask-app with Python3.9 installed.

Reading DataGrids with Kangas

In order to start exploring the DataGrid with Kangas we’ll import some basic packages that will load the dataset from Hugging face and put Kangas into action:

The datasets library accepts a bunch of parameters in order to download the dataset from a local file, an in-memory dataset, or from “The Hub,” we can get a complete list of all the Datasets available in Hugging Face Hub by calling the function list_datasets() or review them on the web page:

Image from https://huggingface.co/datasets

After that, we can proceed to load the dataset in the notebook by passing some parameters to the function, such as split:

split (Split or str) — Which split of the data to load. If None, will return a dict with all splits (typically datasets.Split.TRAIN and datasets.Split.TEST)

You can also look for all the parameters that take the datasets library here:

Loading methods

Methods for listing and loading datasets and metrics: Load a dataset from the Hugging Face Hub, or a local dataset. You…

huggingface.co

We will be working with the train split of the beans dataset and we will be taking all records considering is a relatively small one, the data and metadata of this dataset will be stored in your C:/Users/{user_name}/.cache/

I know you may be wondering why the DataGrid is stored in a .arrow format, and what the heck is that thing? Here is the answer to that.

Learn how to use Kangas with the HuggingFace Hub by watching this quick video.

Once downloaded in your .cache folder, you can go ahead and start working with your dataset in the notebook with the info() function:

We can also pass some other features of the DataGrid class before saving and start visualizing it, functions like get_coumns() or dg.head() , dg.tail() , dg.shape()and dg.info().

When we have explored the elements that comprise the DataGrid, we can save it. For the sake of simplicity, I will leave it saved in my Temp folder, then I will explore the schema of my DataGrid.

Through the get_schema() , as shown in the above image, we can get information about how is set the data and metadata of our DataGrid and also the data types of each of them.

You can also iterate over all rows of the DataGrid and get the asset_id for example:

Once the DataGrid is saved, we can start visualizing it in the Kangas server and we have to go to the directory where the DataGrid was saved and start the server from there:

Going to your browser by typing the URL where the server is executed, there you have your DataGrid, and you can start applying filters, sorting and grouping by columns:

Of course, this is a very basic example of what you can do with Kangas and I encourage you to look for more complex examples in the Kangas official GitHub repo. There are some good examples to help you get inspired to create your own object detection models based in Kangas and PyTorch.

Summary

In this article we have learned how to load datasets that we can use to start an analysis from scratch and select from a huge public repo of models and datasets oriented to Computer Vision, NLP and Audio recognition. We have also explored some other features of the Kangas API for DataGrids analysis and classification.

Remember, you can follow me here on Medium and also on LinkedIn. The code used for this article is also available on GitHub, feel free to clone de repo and use it for practical purposes, or even suggest improvements:

medium_notebooks/analyzing_datagrids_with_kangas.ipynb at main · fvgm-spec/medium_notebooks

You can't perform that action at this time. You signed in with another tab or window. You signed out in another tab or…

github.com

Editor’s Note: Heartbeat is a contributor-driven online publication and community dedicated to providing premier educational resources for data science, machine learning, and deep learning practitioners. We’re committed to supporting and inspiring developers and engineers from all walks of life.

Editorially independent, Heartbeat is sponsored and published by Comet, an MLOps platform that enables data scientists & ML teams to track, compare, explain, & optimize their experiments. We pay our contributors, and we don’t sell ads.

If you’d like to contribute, head on over to our call for contributors. You can also sign up to receive our weekly newsletter (Deep Learning Weekly), check out the Comet blog, join us on Slack, and follow Comet on Twitter and LinkedIn for resources, events, and much more that will help you build better ML models, faster.

Image Visualization with Kangas

Applying built-in functions from Kangas UI to Hugging Face DataGrids

Constructing and visualizing Datagrids in Kangas

A comprehensive introductory tutorial on how to create your Datagrids and then manipulate, classify and visualize in…

What’s Hugging Face

Hugging Face

The AI community building the future. 🏎️ Accelerate training and inference of 🤗 Transformers with easy to use…

Reading DataGrids with Kangas

Loading methods

Methods for listing and loading datasets and metrics: Load a dataset from the Hugging Face Hub, or a local dataset. You…

Summary

medium_notebooks/analyzing_datagrids_with_kangas.ipynb at main · fvgm-spec/medium_notebooks

You can't perform that action at this time. You signed in with another tab or window. You signed out in another tab or…

Written by Felix Gutierrez