Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Equivalent of predict_proba of scikit-learn for ONNX C++ API

I have trained a classification model and I use that the ONNX format of that model in C++ to predict value as follow:

auto inputOnnxTensor = Ort::Value::CreateTensor<float>(memoryInfo, inputValues.data(), inputValues.size(), inputDims.data(), inputDims.size());

auto outputValues = session.Run(Ort::RunOptions{ nullptr }, inputNames.data(), &inputOnnxTensor, 1, outputNames.data(), 1);
   
auto* result = outputValues[0].GetTensorMutableData<int>();

In Python using predict_proba in Scikit-learn we are able to infer the class probabilities (i.e. the probability that a particular data point falls into the underlying classes).

How can I obtain the same probability values of predict_proba() in C++ with ONNX format? Is there any equivalent to predict_proba in ONNX C++ API?

like image 477
Pedram Hooshangitabrizi Avatar asked Nov 01 '25 18:11

Pedram Hooshangitabrizi


1 Answers

When converting your classification model to ONNX (I assume you use skl2onnx), disable ZipMap. I'm not sure about other options, but here is my working code:

model = to_onnx(my_rfc_model, x_train, 
    options={'zipmap': False, 'output_class_labels': False, 'raw_scores': False})
onnx.save_model(model, "model.onnx")

In my case (I use RandomForestClassifier), the model contains one input and two outputs, it is by default. The first output provides the classification results and the second output provides the probabilities for each class. By disabling ZipMap we get probabilities serialized sequentially. For example, if you have 3 possible classes and 2 samples with class probability distributions [[0.1, 0.2, 0.7], [0.3, 0.5, 0.2]], then, when predicting using onnx runtime in C++, the probabilities will be stored in output memory sequentially: [0.1, 0.2, 0.7, 0.3, 0.5, 0.2].

To get probabilities, use correct output name (by default it is probabilities). You can find all output names using GetOutputCount() and GetOutputName(). See this example: https://github.com/leimao/ONNX-Runtime-Inference/blob/main/src/inference.cpp

Create output tensor with enough space to hold probabilities for each class:

std::vector<float> proba(3 * num_samples);
std::vector<Ort::Value> output_tensors;                                                         
output_tensors.push_back(Ort::Value::CreateTensor<float>(
memoryInfo, proba.data(), 3*num_samples, output_dims_.data(), output_dims_.size()));

Note that we provide room for 3 * number_of_samples floats.

Run prediction:

session_->Run(Ort::RunOptions{nullptr}, input_names.data(), input_tensors.data(), 1, 
    output_names.data(), output_tensors.data(), 1);

In my case output_names declared as follows:

std::vector<const char*> output_names {"probabilities"};

Hope this will help you.

like image 159
Ordev Agens Avatar answered Nov 03 '25 10:11

Ordev Agens