Skip to content

Latest commit

 

History

History
207 lines (184 loc) · 6.49 KB

model_configuration-en.md

File metadata and controls

207 lines (184 loc) · 6.49 KB

English | 中文

Model Configuration

Each model in the model repository must contain a configuration that provides required and optional information about the model. The configuration information is generally written in ModelConfig protobuf format in file config.pbtxt.

Minimum Model General Configuration

Please see the official website for detailed general configuration: model_configuration. Minimum model configuration of Triton must include: attribute platform or backend, attribute max_batch_size and input and output of the model.

For example, the minimum configuration of a Paddle model should be (with two inputs input0 and input1, one output output0, both inputs and outputs are tensors of type float32, and the maximum batch is 8):

  backend: "fastdeploy"
  max_batch_size: 8
  input [
    {
      name: "input0"
      data_type: TYPE_FP32
      dims: [ 16 ]
    },
    {
      name: "input1"
      data_type: TYPE_FP32
      dims: [ 16 ]
    }
  ]
  output [
    {
      name: "output0"
      data_type: TYPE_FP32
      dims: [ 16 ]
    }
  ]

Configuration of CPU, GPU and Instances number

The attribute instance_group allows you to configure hardware resource and model inference instances number.

Here's an example of CPU deployment:

  instance_group [
    {
      # Create two CPU instances
      count: 2
      # Use CPU for deployment 
      kind: KIND_CPU
    }
  ]

Another example of deploying two instances on GPU 0, and one instance each on GPU1 and GPU:

  instance_group [
    {
      # Create tow GPU instances
      count: 2
      # Use GPU for inference
      kind: KIND_GPU
      # Deploy on GPU 0
      gpus: [ 0 ]
    },
    {
      count: 1
      kind: KIND_GPU
      # Deploy on GPU 1,2
      gpus: [ 1, 2 ]
    }
  ]

Name, Platform and Backend

The attribute name is optional. If the model is not specified in the configuration, then the name is the model's directory name. When the name is specified, it should match the directory name.

Set fastdeploy backend. You should not configure attribute platform, but please instead configure attribute backend to fastdeploy.

backend: "fastdeploy"

FastDeploy Backend Configuration

Currently FastDeploy backend supports cpu and gpu inference, with paddle, onnxruntime and openvino inference engines supported on cpu, and paddle, onnxruntime and tensorrt engines supported on gpu.

Paddle Engine Configuration

In addition to configuring Instance Groups, deciding whether the model runs on CPU or GPU, the Paddle engine can be configured as follows. You can see more specific examples in A PP-OCRv3 example for Runtime configuration.

optimization {
  execution_accelerators {
    # CPU inference configuration, used with KIND_CPU.
    cpu_execution_accelerator : [
      {
        name : "paddle"
        # Set parallel inference computing threads number to 4.
        parameters { key: "cpu_threads" value: "4" }
        # Set mkldnn acceleration on, or off when set to 0.
        parameters { key: "use_mkldnn" value: "1" }
      }
    ],
    # GPU inference configuration, used with KIND_GPU.
    gpu_execution_accelerator : [
      {
        name : "paddle"
        # Set parallel inference computing threads number to 4.
        parameters { key: "cpu_threads" value: "4" }
        # Set mkldnn acceleration on, or off when set to 0.
        parameters { key: "use_mkldnn" value: "1" }
      }
    ]
  }
}

ONNXRuntime Engine Configuration

In addition to configuring Instance Groups, deciding whether the model runs on CPU or GPU, the ONNXRuntime engine can be configured as follows. You can see more specific examples in A YOLOv5 example for Runtime configuration.

optimization {
  execution_accelerators {
    cpu_execution_accelerator : [
      {
        name : "onnxruntime"
        # Set parallel inference computing threads number to 4.
        parameters { key: "cpu_threads" value: "4" }
      }
    ],
    gpu_execution_accelerator : [
      {
        name : "onnxruntime"
      }
    ]
  }
}

OpenVINO Engine Configuration

The OpenVINO engine only supports inferring on CPU, which can be configured as:

optimization {
  execution_accelerators {
    cpu_execution_accelerator : [
      {
        name : "openvino"
        # Set parallel inference computing threads number to 4 (total number of threads for all instances).
        parameters { key: "cpu_threads" value: "4" }
        # Set num_streams in OpenVINO (usually the same as instances number)
        parameters { key: "num_streams" value: "1" }
      }
    ]
  }
}

TensorRT Engine Configuration

The TensorRT engine only supports inferring on GPU, which can be configured as:

optimization {
  execution_accelerators {
    gpu_execution_accelerator : [
      {
        name : "tensorrt"
        # Use FP16 inference in TensorRT. You can also choose: trt_fp32
        # If the loaded model is a quantized model, this precision will be int8 automatically
        parameters { key: "precision" value: "trt_fp16" }
      }
    ]
  }
}

You can configure the TensorRT dynamic shape in the following format, and refer to A PaddleCls example for Runtime configuration:

optimization {
  execution_accelerators {
  gpu_execution_accelerator : [ {
    # use TRT engine
    name: "tensorrt",
    # use fp16 on TRT engine
    parameters { key: "precision" value: "trt_fp16" }
  },
  {
    # Configure the minimum shape of dynamic shape
    name: "min_shape"
    # All input name and minimum shape
    parameters { key: "input1" value: "1 3 224 224" }
    parameters { key: "input2" value: "1 10" }
  },
  {
    # Configure the optimal shape of dynamic shape
    name: "opt_shape"
    # All input name and optimal shape
    parameters { key: "input1" value: "2 3 224 224" }
    parameters { key: "input2" value: "2 20" }
  },
  {
    # Configure the maximum shape of dynamic shape
    name: "max_shape"
    # All input name and maximum shape
    parameters { key: "input1" value: "8 3 224 224" }
    parameters { key: "input2" value: "8 30" }
  }
  ]
}}