tensorrt
- Version:
8.0.3.4, 7.0.0.11
- Category:
ai
- Cluster:
Loki
Description
NVIDIA TensorRT is a high-performance deep learning inference SDK for deploying AI models on NVIDIA GPUs. It supports model optimization, quantization, and deployment from popular frameworks such as TensorFlow, PyTorch, and ONNX.
TensorRT 8.0.3.4 features:
Highly optimized INT8 and FP16 inference
ONNX and native parser support
Multi-stream execution
Layer and kernel fusion
CUDA 10.2 compatibility for legacy GPU environments
TensorRT accelerates models for image classification, segmentation, object detection, and language modeling.
Documentation
tensorrt is typically used via Python or C++ APIs.
Python Example:
----------------
>>> import tensorrt as trt
>>> logger = trt.Logger(trt.Logger.WARNING)
>>> builder = trt.Builder(logger)
>>> print(trt.__version__)
'8.0.3'
Command-line tools:
-------------------
trtexec --onnx=model.onnx --explicitBatch --saveEngine=model.engine
polygraphy run model.onnx --onnxrt --trt
Help:
$ trtexec --help
$ polygraphy --help
Examples/Usage
Load the module:
$ module load tensorrt-cuda10.2/8.0.3.4
Run trtexec on an ONNX model:
$ trtexec --onnx=model.onnx --saveEngine=model.engine --explicitBatch
Python API usage:
import tensorrt as trt
TRT_LOGGER = trt.Logger(trt.Logger.WARNING)
builder = trt.Builder(TRT_LOGGER)
print("TensorRT version:", trt.__version__)
Inspect TensorRT tools:
$ which trtexec
$ trtexec --help
Unload the module:
$ module unload tensorrt-cuda10.2/8.0.3.4
Installation
Source code is obtained from TensorRT