tensorrt

Version:

8.0.3.4, 7.0.0.11

Category:

ai

Cluster:

Loki

Author / Distributor

https://developer.nvidia.com/tensorrt

Description

NVIDIA TensorRT is a high-performance deep learning inference SDK for deploying AI models on NVIDIA GPUs. It supports model optimization, quantization, and deployment from popular frameworks such as TensorFlow, PyTorch, and ONNX.

TensorRT 8.0.3.4 features:

  • Highly optimized INT8 and FP16 inference

  • ONNX and native parser support

  • Multi-stream execution

  • Layer and kernel fusion

  • CUDA 10.2 compatibility for legacy GPU environments

TensorRT accelerates models for image classification, segmentation, object detection, and language modeling.

Documentation

tensorrt is typically used via Python or C++ APIs.

Python Example:
----------------
>>> import tensorrt as trt
>>> logger = trt.Logger(trt.Logger.WARNING)
>>> builder = trt.Builder(logger)
>>> print(trt.__version__)
'8.0.3'

Command-line tools:
-------------------
trtexec --onnx=model.onnx --explicitBatch --saveEngine=model.engine
polygraphy run model.onnx --onnxrt --trt

Help:
  $ trtexec --help
  $ polygraphy --help

Examples/Usage

  • Load the module:

$ module load tensorrt-cuda10.2/8.0.3.4
  • Run trtexec on an ONNX model:

$ trtexec --onnx=model.onnx --saveEngine=model.engine --explicitBatch
  • Python API usage:

import tensorrt as trt
TRT_LOGGER = trt.Logger(trt.Logger.WARNING)
builder = trt.Builder(TRT_LOGGER)
print("TensorRT version:", trt.__version__)
  • Inspect TensorRT tools:

$ which trtexec
$ trtexec --help
  • Unload the module:

$ module unload tensorrt-cuda10.2/8.0.3.4

Installation

Source code is obtained from TensorRT