Journey using CVAT semi-automatic annotation with a partially trained model to tag additional images (for further model tuning)
Background
I have had 2 different object detection projects on hand lately, and the manual annotation of data from scratch have been a frustrating task
One platform I used is the website MakeSense, that’s a very good project that the image annotation is run on browser directly.
The problem of such is the tons of clicks required to tag the whole dataset, while this platform also include an AI assited tagging, you can either use a model on roboflow, or provide your own model (which required a Tensorflow.js that comply to YOLOv5 with specific requirement on model output, which easily got mess up thru conversion of model from PyTorch > ONNX > Tensorflow > TensorflowJS)
Computer Vision Annotation Tool (CVAT)
CVAT is build by Intel for doing computer vision annotation which put together openCV, OpenVino (to speed up CPU inference). There is a cloud platform (cvat.ai) as well as on-premise installation, the advantage is the on-premise installation is container based, so running a “docker-compose up” command would already put the whole stack up.
(Semi) automated annotation
The CVAT (semi) automated annotation allow user to use something call nuclio, which is a tool aimed to assist automated data science through serverless deployment.
The way CVAT integrate your custom model is to wrap the model inside a docker container and require us to implement specific function call to allow handling of input data thru API call.
For a local deployment, we can treat our own docker environment (hosting the CVAT stack) as nuclio serverless target and deploy the nuclio along side with the CVAT.
To support the serverless, we would run the following command instead (which include additional docker compose file with a container called nuclio)
docker compose -f docker-compose.yml -f components/serverless/docker-compose.serverless.yml -up -d
Explain the Nuclio
According to the examples page, it support different programming languages, including Go, Python, NodeJS, .NET, Java and shell script.
As said, a nuclio would be wrapped into a contaier, there is a yaml to define how to setup the container, as well as some supporting code that run the inference. Let’s look as nuclio implemented inside CVAT:
While the actual function is main.py and supporting functions in model_handler.py
The main.py have 2 functions, the handler function that takes in a context and event, and the init_context function that takes in a context.
According to best practice document of Nuclio, the ini_context function is used for provisioning variables being used for each API call (e.g. loading the model into memory and allow it being used for each incoming request).
Note that the init_context is implemented in each of the runtime in nuclio code (see Python runtime code below):
While the handler, I believe it tie to the functional.yaml spec, so we could likely actually name it differently.
The handler function is to run the model (prepared in init_context) and return the model inferred result wrapped as API response.
And the result is in JSON as:
What I would try
I am going to try 2 things:
- Local model inference
- Remote model inference (by serving an API call conform to what CVAT expected remotely, and the nuclio pass the image as API param and get the result from remote execution)
Custom YOLOv7 model
As I have my own YOLOv7 model (see my previous post) that have different output compare to original, so there are some changes I need to make, and throughout that journey, I can see I am lacking some effective debugging approach and need to rely on print.
Using the YOLOv7 provided by CVAT as base, here is my setup:
The major changes are:
(1) In functional.yaml (and function-gpu.yaml), we need to change the classes (as well as some other metadata like name of nuclio)
(2) update the _infer function according to the changes I need to make regarding my custom model
(3) build a custom deploy script as following the instruction of CVAT script, the nuclio deploy always complain the function have no name
Deploy the nuclio
at the folder <cvat root>/ serverless/custom, running:
# bash deploy.sh <the subfolder that hold the nuclio files> <project name>
bash deploy.sh anw_yolov7 cvat
after running, we can see the result of showing nuclio function that are running (which one can check by command: “nuctl get function --platform local”) :
And the docker container being created and running with following:
And we can use the model to run the auto tagging
Remote model inference
Again, this is not to deploying the nuclio (with a model inside) on a remote serverless (like AWS or other cloud platforms), instead I would like to run the model inference in a remote Mac Mini M1.
The idea would be wrap the model inference into an API call and my local nuclio is just doing an API call.
I have not completed the remote model inference, but I have setup a sample code to run API call in nuclio (that call getting IP address API). I am using NodeJS instead in this, and running axios as network request package.
Conclusion
So far it look promising to use CVAT to perform image tagging with support of partially trained model to further enrich dataset and then further retrain. Sure roboflow or other cloud platform could be better options, but I am providing a poor man option.