Journey using CVAT semi-automatic annotation with a partially trained model to tag additional images (for further model tuning)

Stephen Cow Chau
6 min readJun 4, 2023

Background

I have had 2 different object detection projects on hand lately, and the manual annotation of data from scratch have been a frustrating task

One platform I used is the website MakeSense, that’s a very good project that the image annotation is run on browser directly.

The UI of MakeSense

The problem of such is the tons of clicks required to tag the whole dataset, while this platform also include an AI assited tagging, you can either use a model on roboflow, or provide your own model (which required a Tensorflow.js that comply to YOLOv5 with specific requirement on model output, which easily got mess up thru conversion of model from PyTorch > ONNX > Tensorflow > TensorflowJS)

Computer Vision Annotation Tool (CVAT)

CVAT is build by Intel for doing computer vision annotation which put together openCV, OpenVino (to speed up CPU inference). There is a cloud platform (cvat.ai) as well as on-premise installation, the advantage is the on-premise installation is container based, so running a “docker-compose up” command would already put the whole stack up.

the UI for annotation, image ref: https://opencv.github.io/cvat/docs/manual/basics/shape-mode-basics/
The base containers that run when we put the CVAT stack up (not included auto annotation)

(Semi) automated annotation

The CVAT (semi) automated annotation allow user to use something call nuclio, which is a tool aimed to assist automated data science through serverless deployment.

The way CVAT integrate your custom model is to wrap the model inside a docker container and require us to implement specific function call to allow handling of input data thru API call.

For a local deployment, we can treat our own docker environment (hosting the CVAT stack) as nuclio serverless target and deploy the nuclio along side with the CVAT.

To support the serverless, we would run the following command instead (which include additional docker compose file with a container called nuclio)

docker compose -f docker-compose.yml -f components/serverless/docker-compose.serverless.yml -up -d

Explain the Nuclio

According to the examples page, it support different programming languages, including Go, Python, NodeJS, .NET, Java and shell script.

As said, a nuclio would be wrapped into a contaier, there is a yaml to define how to setup the container, as well as some supporting code that run the inference. Let’s look as nuclio implemented inside CVAT:

a YOLOv7 nuclio implementation under the CVAT git clone root > serverless > onnx > WongKinYiu > yolov7 > nuclio folder
The function.yaml is defining a CPU nuclio, which we can see some trace of DockerFile like the installation of packages through apt install and download of model onnx file

While the actual function is main.py and supporting functions in model_handler.py

The main.py have 2 functions, the handler function that takes in a context and event, and the init_context function that takes in a context.

According to best practice document of Nuclio, the ini_context function is used for provisioning variables being used for each API call (e.g. loading the model into memory and allow it being used for each incoming request).

Note that the init_context is implemented in each of the runtime in nuclio code (see Python runtime code below):

https://github.com/nuclio/nuclio/blob/master/pkg/processor/runtime/python/py/_nuclio_wrapper.py

While the handler, I believe it tie to the functional.yaml spec, so we could likely actually name it differently.

in functional.yaml spec, we mentioned the handler is main:handler function

The handler function is to run the model (prepared in init_context) and return the model inferred result wrapped as API response.

And the result is in JSON as:

What I would try

I am going to try 2 things:

  1. Local model inference
  2. Remote model inference (by serving an API call conform to what CVAT expected remotely, and the nuclio pass the image as API param and get the result from remote execution)

Custom YOLOv7 model

As I have my own YOLOv7 model (see my previous post) that have different output compare to original, so there are some changes I need to make, and throughout that journey, I can see I am lacking some effective debugging approach and need to rely on print.

Using the YOLOv7 provided by CVAT as base, here is my setup:

The major changes are:

(1) In functional.yaml (and function-gpu.yaml), we need to change the classes (as well as some other metadata like name of nuclio)

(2) update the _infer function according to the changes I need to make regarding my custom model

a lot of print to debug the shape needed, as well as changes to transform the result of model output (the “detections” variable)
also in load_network function to see expected input and output from the ONNX model

(3) build a custom deploy script as following the instruction of CVAT script, the nuclio deploy always complain the function have no name

Deploy the nuclio

at the folder <cvat root>/ serverless/custom, running:

# bash deploy.sh <the subfolder that hold the nuclio files> <project name>
bash deploy.sh anw_yolov7 cvat

after running, we can see the result of showing nuclio function that are running (which one can check by command: “nuctl get function --platform local”) :

And the docker container being created and running with following:

the debug print we added is shown in the container’s log which helped me alot

And we can use the model to run the auto tagging

Remote model inference

Again, this is not to deploying the nuclio (with a model inside) on a remote serverless (like AWS or other cloud platforms), instead I would like to run the model inference in a remote Mac Mini M1.

The idea would be wrap the model inference into an API call and my local nuclio is just doing an API call.

I have not completed the remote model inference, but I have setup a sample code to run API call in nuclio (that call getting IP address API). I am using NodeJS instead in this, and running axios as network request package.

Conclusion

So far it look promising to use CVAT to perform image tagging with support of partially trained model to further enrich dataset and then further retrain. Sure roboflow or other cloud platform could be better options, but I am providing a poor man option.

WRITER at MLearning.ai // Code Interpreter // Animate Midjourney

--

--