Deployment of PyTorch Model Using NCNN for Mobile Devices — Part 1

8 min readMay 4, 2023

An introductory example of deploying a pretrained PyTorch model into a C++ app using NCNN for mobile devices.

Network deployment on mobile phone. — Deployment of deep neural network on mobile phone. (a) image by author, (b) image by author, (c) image from “Attention is All You Need” Vaswani et al. [1], (d) image by Shiwa ID on Unsplash.

Introduction

Many deep learning models, like CNNs, Transformers, etc., have been developed today, and Large Language Models (LLMs), like ChatGPT, greatly boost the level of model intelligence. To increase their usages in real life, it is important to explore how to deploy the models to mobile devices, like smart phones, AR glasses, etc. So, this article is to provide an introductory example about how to deploy a pretrained PyTorch model into a C++ app using NCNN, which can be further integrated into Android for mobile devices.

In particular, I like to discuss the following contents.

What is NCNN?
Pipeline of deploying a pretrained PyTorch model into a C++ app using using NCNN.
An introductory example of deploying the pretrained PyTorch model into the C++ app using NCNN.

NCNN (Neural Network Computing Library)

NCNN is a high-performance neural network inference framework developed by Tencent, which is designed specifically for mobile and embedded devices [2]. NCNN is lightweight and efficient, making it an ideal choice for deploying deep learning models on devices with limited computational resources.

The model of the NCNN consists of two files: .param and .bin. The .param file contains the model structure and parameter information, including layer types, layer names, layer parameters, and input/output shapes. This file is typically a lightweight text file that can be easily parsed and loaded by the NCNN framework.

The .bin file contains the trained model weights in binary format. These weights are learned during the training process and are used to make predictions during inference. The .bin file is usually much larger than the .param file, as it contains the actual numerical values of the learned parameters.

Pipeline of deploying a pretrained PyTorch model into C++ app using NCNN

PyTorch model deployment pipeline using NCNN — Pipeline of deploying a pretrained PyTorch model into C++ app using NCNN — image by author.

For the deployment, a pretrained PyTorch model (.pth/.pt) is first converted to an ONNX file (.onnx) in PyTorch. Then the .onnx model file will be converted by the NCNN converter to the NCNN model consisting of two files: a .param file and a .bin file. Finally, the NCNN model files are loaded by a C++ code to perform the inference using the NCNN library. The C++ code can be further integrated into Android to perform inference on mobile devices. In this post, I focus my discussion on the deployment of the NCNN model on a Linux (Ubuntu) computer. I will talk about the integration of the C++ inference code into Android in my next post.

Conversion to ONNX file

In my previous post, Deploying PyTorch Model into a C++ Application Using ONNX Runtime, I discussed how to convert a PyTorch pretrained model to an ONNX file using torch.onnx.export function in PyTorch. For more detailed information about ONNX and the conversion, please refer to my previous post.

torch.onnx.export(net,       # model being run       
                  x,         # model input
                  "model.onnx",  # where to save the model
                  export_params=True, # store the trained weights
                  opset_version=11,   # the ONNX version
                  do_constant_folding=True,
                  input_names= ['input'], # set model input names    
                  output_names=['output'], # set model output names
)

An end-to-end example of deploying the pretrained PyTorch model into C++ app using NCNN

I still use the PyTorch tutorial image classification example for CIFAR10 dataset [3], discussed in my previous post, to explain how to use NCNN to do the deployment. My explanation starts from the point where image_classifier.onnx has been generated by PyTorch. Next step is to convert image_classifier.onnx to the model format required by NCNN.

To use NCNN, I first install and set up NCNN in Ubuntu. Below are the steps of installing NCNN.

git clone https://github.com/Tencent/ncnn.git
cd ncnn
mkdir -p build
cd build
cmake -DCMAKE_BUILD_TYPE=Release -DNCNN_VULKAN=OFF \ 
-DNCNN_SYSTEM_GLSLANG=ON -DNCNN_BUILD_EXAMPLES=ON \
-DNCNN_SHARED_LIB=ON ..
make
make install

After NCNN has been installed, two useful tools onnx2ncnn and ncnnoptimize are generated in ncnn_root/build/tools/onnx/ and ncnn_root/build/tools/, respectively, where ncnn_root is the root directory of NCNN.

For this example, I will create the following file tree and I put all the model files to the models subdirectory.

I copy onnx2ncnn and ncnnoptimize to the root directory of this example. In the root directory, I convert the ONNX model to the NCNN model by executing the following commands.

python3 -m onnxsim models/image_classifier.onnx \
models/image_classifier_sim.onnx
./onnx2ncnn models/image_classifier_sim.onnx \
models/image_classifier.param models/image_classifier.bin
./ncnnoptimize models/image_classifier.param models/image_classifier.bin \
models/image_classifier_opt.param models/image_classifier_opt.bin 65536

Among the above commands, the ONNX model is first optimized by onnxsim to generate an optimized ONNX model with faster execution time and lower memory usage. The optimized ONNX model is then converted to the NCNN model by onnx2ncnn. Finally, the NCNN model is further optimized by ncnnoptimize to reduce size and complexity while maintaining accuracy for deployment on mobile and embedded devices. The number 65536 is the maximum input image size that the optimized model can handle.

With the optimized NCNN model, I write the C++ inference code for performing image classification. I first make the CMakeLists.txt file for compiling the project. To use NCNN, I specify the include and the link directories of NCNN. Also, since OpenCV is used to load the input image and visualize results, I set up OpenCV in CMakeLists.txt too (assuming OpenCV has been installed).

NCNN_cmake — CMakeLists.txt — image by author.

Now, it is time to make image_classifier.cpp.

#include <iostream>
#include <fstream>
#include <stdio.h>
#include <algorithm>
#include <vector>
#include <string>
    
#include <opencv2/core/core.hpp>
#include <opencv2/highgui/highgui.hpp>
#include <opencv2/imgproc/imgproc.hpp>
    
#include "net.h"    // must include this file in order to use NCNN net.
    
int main(int argc, char** argv) {
    // Load image
    std::string imagepath("../images/horse.png");
    cv::Mat img = cv::imread(imagepath, cv::IMREAD_COLOR);
    if (img.empty()) {
        std::cerr << "Unable to read image file " << imagepath << std::endl;
        return -1;
    }
    
    // Specify the names of all classes for image classification
    std::vector<std::string> classes = {"plane", "car",  "bird", "cat",
                                        "deer",  "dog",  "frog", "horse",
                                        "ship",  "truck"};
    
    // Load NCNN model
    ncnn::Net net;
    int ret = net.load_param("../models/image_classifier_opt.param");
    if (ret) std::cerr << "Failed to load model parameters" << std::endl; 
    ret = net.load_model("../models/image_classifier_opt.bin");
    if (ret) std::cerr << "Failed to load model weights" << std::endl;
    
    // Convert image data to ncnn format
    // opencv image in bgr, model needs bgr
    ncnn::Mat input = ncnn::Mat::from_pixels(img.data, 
        ncnn::Mat::PIXEL_BGR, img.cols, img.rows);
    
    // Preprocessing of image data
    const float mean_vals[3] = {0.5f*255.f, 0.5f*255.f, 0.5f*255.f};
    const float norm_vals[3] = {1/0.5f/255.f, 1/0.5f/255.f, 1/0.5f/255.f};
    input.substract_mean_normalize(mean_vals, norm_vals);
    
    // Inference
    ncnn::Extractor extractor = net.create_extractor();
    extractor.input("input", input);
    ncnn::Mat output;
    extractor.extract("output", output);
    
    // Flatten
    ncnn::Mat out_flatterned = output.reshape(output.w * output.h * output.c);
    std::vector<float> scores;
    scores.resize(out_flatterned.w);
    for (int j=0; j<out_flatterned.w; j++) {
        scores[j] = out_flatterned[j];
    }

    // Prediction based on scores
    std::string pred_class = 
        classes[std::max_element(scores.begin(), 
                                 scores.end()) - scores.begin()];

    std::cout << "The predicted class is " << pred_class << "." << std::endl;

    // Save and visualize results
    cv::imwrite("../images/out_horse.png", img);
    cv::namedWindow("Input_image", cv::WINDOW_NORMAL);
    cv::imshow("Input_image", img);
    cv::waitKey(0);
    std::cout << "Completed" << std::endl;
    return 0;
}

In image_classifier.cpp, the input image is first loaded by OpenCV. Then the NCNN model files, .param and .bin, are loaded. Note that If net.load_param and net.load_model run successfully, 0 values are returned. Otherwise, non-zero values are returned to show the loading failure.

ncnn::Net net;
int ret = net.load_param("../models/image_classifier_opt.param");
if (ret) std::cerr << "Failed to load model parameters" << std::endl; 
ret = net.load_model("../models/image_classifier_opt.bin");
if (ret) std::cerr << "Failed to load model weights" << std::endl;

Next, the OpenCV image needs to be converted to the format NCNN requires. Since the OpenCV image is in BGR format and the model requires BGR image as the input, I use ncnn::Mat::PIXEL_BGR.

ncnn::Mat input = ncnn::Mat::from_pixels(img.data, ncnn::Mat::PIXEL_BGR, 
                                         img.cols, img.rows);

In order to perform inference correctly, the image needs to be normalized in the same way, as the way used in the training phase. However, subtract_mean_normalize takes the image pixels in [0, 255] and normalizes the image pixel by X * (1 / std) — mean * (1/std), where X stands for pixel intensity for BGR channels, mean is 255 ∗ [0.5, 0.5, 0.5] for BGR channels, and std is 255∗[0.5, 0.5, 0.5] for BGR channels, and [0.5, 0.5, 0.5] and [0.5, 0.5, 0.5] are mean and standard deviation for the image pixels in [0, 1] for the CIFAR10 dataset. Therefore, the input image is normalized in C++ as follows.

const float mean_vals[3] = {0.5f*255.f, 0.5f*255.f, 0.5f*255.f};
const float norm_vals[3] = {1/0.5f/255.f, 1/0.5f/255.f, 1/0.5f/255.f};
input.substract_mean_normalize(mean_vals, norm_vals);

The inference is performed by ncnn_Extractor. The names of the input node and the output node need to be provided in order to perform inference and extract the result. Since I specify the name of the input node as “input” and specified the name of the output node as “output” when I convert the PyTorch model to the ONNX model, I provide those names to ncnn_Extractor.

ncnn::Extractor extractor = net.create_extractor();
extractor.input("input", input);
ncnn::Mat output;
extractor.extract("output", output);

The inference results are then flattened to get prediction scores over the classes of the CIFAR10 dataset. The predicted class is the class with the maximum score.

// Flatten
ncnn::Mat out_flatterned = output.reshape(output.w * output.h * output.c);
std::vector<float> scores;
scores.resize(out_flatterned.w);
for (int j=0; j<out_flatterned.w; j++) {
    scores[j] = out_flatterned[j];
}

// Prediction based on scores
std::string pred_class = 
    classes[std::max_element(scores.begin(), scores.end()) - scores.begin()];

At this point, we are ready to compile and run the code to see the image classification result.

mkdir build && cd build
cmake ..
make
./image_classification

We see that the horse image is correctly classified as the horse.

The complete implementation of the deployment discussed above is available from the Github, which starts from training to the deployment into the C++ image classification using NCNN.

Conclusions

In this post, I gave an introductory example about deploying a pretrained PyTorch model into a C++ app using NCNN. In particular, I discussed (a) what is NCC?, (b) the pipeline of deploying a pretrained PyTorch model into the C++ app using NCNN, and (c) a concrete image classification example showing the PyTorch pretrained model deployment into the C++ app using NCNN. We see the correct image classification result using NCNN in an Ubuntu computer showing the success of the deployment.

In my next article, I will talk about how to integrate the C++ image classification code into Android for mobile device deployment. Thank you for reading.

References

[1] Vaswani et al., Attention is All You Need, Dec. 2017.

[2] Tencent, NVIDIA CUDA Convolutional Neural Network, https://github.com/tencent/ncnn, 2019.

[3] PyTorch, PyTorch tutorial on image classification example, https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html, 2023

BECOME a WRITER at MLearning.ai

Mlearning.ai Submission Suggestions

How to become a writer on Mlearning.ai

medium.com