[人工智能] tensorrt学习记录-----1

开发: C++知识库 Java知识库 JavaScript Python PHP知识库人工智能区块链大数据移动开发嵌入式开发工具数据结构与算法开发测试游戏开发网络协议系统运维
教程: HTML教程 CSS教程 JavaScript教程 Go语言教程 JQuery教程 VUE教程 VUE3教程 Bootstrap教程 SQL数据库教程 C语言教程 C++教程 Java教程 Python教程 Python3教程 C#教程
数码: 电脑笔记本显卡显示器固态硬盘硬盘耳机手机 iphone vivo oppo 小米华为单反装机图拉丁

-> 人工智能 -> tensorrt学习记录-----1 -> 正文阅读

[人工智能]tensorrt学习记录-----1

https://docs.nvidia.com/deeplearning/tensorrt/api/index.html includes implementations for the most common deep learning layers.

https://docs.nvidia.com/deeplearning/tensorrt/api/c_api/classnvinfer1_1_1_i_plugin.html to provide implementations for infrequently used or more innovative layers that are not supported natively by TensorRT.

Alternatives to using TensorRT include:
? Using the training framework itself to perform inference.
? Writing a custom application that is designed specifically to execute the network using
low-level libraries and math operations.

Generally, the workflow for developing and deploying a deep learning model goes through three phases.
? Phase 1 is training
? Phase 2 is developing a deployment solution, and
? Phase 3 is the deployment of that solution
TensorRT is generally not used during any part of the training phase.
在这里插入图片描述
After the network is parsed, consider your optimization options – batch size, workspace size, mixed precision, and bounds on dynamic shapes. These options are chosen and specified as part of the TensorRT build step, where you build an optimized inference engine based on your network.

To initialize the inference engine, the application first deserializes the model from the plan file into an inference engine. TensorRT is usually used asynchronously; therefore, when the input data arrives, the program
calls an enqueue function with the input buffer and the buffer in which TensorRT should put the result.

To optimize your model for inference, TensorRT takes your network definition, performs network-specific and platform-specific optimizations, and generates the inference engine. This process is referred to as the build phase. The build phase can take considerable time, especially when running on embedded platforms. Therefore, a typical application builds an
engine once and then serializes it as a plan file for later use.

The build phase optimizes the network graph by eliminating dead computations, folding constants, and reordering and combining operations to run more efficiently on the GPU. The builder can also be configured to reduce the precision of computations. It can automatically reduce 32-bit floating-point calculations to 16-bit and supports quantization of floating-point values so that calculations can be performed using 8-bit integers. Quantization requires dynamic range information, which can be provided by the application, or calculated by TensorRT using representative network inputs, a process called calibration. The build phase also runs multiple implementations of operators to find those which, when combined with any intermediate precision conversions and layout transformations, yield the fastest overall implementation of the network.

The Network Definition interface provides methods for the application to define a network. Input and output tensors can be specified, and layers can be added and configured. As well as layer types, such as convolutional and recurrent layers, a Plugin layer type allows the application to implement functionality not natively supported by TensorRT. For more information, refer to the Network Definition API. https://docs.nvidia.com/deeplearning/tensorrt/api/c_api/classnvinfer1_1_1_i_network_definition.html

人工智能最新文章

2022吴恩达机器学习课程——第二课（神经网

第十五章规则学习

FixMatch: Simplifying Semi-Supervised Le

数据挖掘Java——Kmeans算法的实现

大脑皮层的分割方法

【翻译】GPT-3是如何工作的

论文笔记:TEACHTEXT: CrossModal Generaliz

python从零学（六）

详解Python 3.x 导入(import)

【答读者问27】backtrader不支持最新版本的

加:2021-11-25 08:07:06 更:2021-11-25 08:08:12

360图书馆购物三丰科技阅读网日历万年历 2025年7日历

-2025/7/19 9:50:41-

图片自动播放器
↓图片自动播放器↓

TxT小说阅读器
↓语音阅读,小说下载,古典文学↓

一键清除垃圾
↓轻轻一点,清除系统垃圾↓

图片批量下载器
↓批量下载图片,美女图库↓

网站联系: qq:121756557 email:121756557@qq.com IT数码