Constructing and Visualizing Datagrids in Kangas

A comprehensive introductory tutorial on how to create your Datagrids and then manipulate, classify, and visualize in Kangas UI

Felix Gutierrez
Heartbeat

--

Image from https://github.com/comet-ml/kangas

Introduction

Kangas is a tool developed by Comet that is still in the beta phase but is open-source and free to use for everyone. It’s defined as a tool for exploring, analyzing, and visualizing large-scale multimedia data. According to its GitHub page:

The key features of Kangas include:

- Scalability. Kangas DataGrid, the fundamental class for representing datasets, can easily store millions of rows of data.

- Performance. Group, sort, and filter across millions of data points in seconds with a simple, fast UI.

- Interoperability. Any data, any environment. Kangas can run in a notebook or as a standalone app, both locally and remotely.

- Integrated computer vision support. Visualize and filter bounding boxes, labels, and metadata without any extra setup.

In this tutorial, I will show you how to get started with this new Computer Vision tool by experimenting with how to generate our own DataGrids, and analyze previously created ones. We will also compare DataGrids with Dataframes.

What is a DataGrid?

First, let’s see what information brings to us what we consider the main source of truth aka the world wide web. When we Google the term, DataGrid is not so accurate and the most concise is the one defined in the Kangas documentation:

“The DataGrid instance can be imagined as a two-dimensional list of lists. The first dimension is the row, and the second dimension is the column.”

As we will see later in the article, a DataGrid could be comprised besides data itself (strings and integers), also by images.

The DataGrid instance has the following attributes:

  • Columns: list of column names, or a dictionary of column names mapped to column types.
  • Data: list of lists where each is a row of data.
  • Name: the name of the tabular data

The methods to explore data are very similar to the ones on the pandas library:

  • dg.info() Shows the data about rows, columns, and datatypes
  • dg.head() Shows the first few rows of a DataGrid
  • dg.tail() Shows the last few rows of a DataGrid
  • dg.show() Opens up an IFrame (if in a Jupyter Notebook) or a web browser page showing the DataGrid UI

Let’s put Kangas into action and work on some examples. But first, as you may know with other Python libraries, you’ll need to install it in your environment or create a brand new venv.

If you are working on a Notebook:

%pip install kangas

Or if you are in the command line:

pip install kangas

Once that’s done just:

import kangas as kg

Read the remainder of this article on Comet.com.

Editor’s Note: Heartbeat is a contributor-driven online publication and community dedicated to providing premier educational resources for data science, machine learning, and deep learning practitioners. We’re committed to supporting and inspiring developers and engineers from all walks of life.

Editorially independent, Heartbeat is sponsored and published by Comet, an MLOps platform that enables data scientists & ML teams to track, compare, explain, & optimize their experiments. We pay our contributors, and we don’t sell ads.

If you’d like to contribute, head on over to our call for contributors. You can also sign up to receive our weekly newsletter (Deep Learning Weekly), check out the Comet blog, join us on Slack, and follow Comet on Twitter and LinkedIn for resources, events, and much more that will help you build better ML models, faster.

--

--

Data Engineer. I write and learn-by-doing different topics related to Data.