Expo Talk Panel

Neural network models can be very large and compute intensive, which can make them challenging to run on edge devices. Model quantization provides significant benefits in power and memory efficiency, as well as latency. Quantization of a 32-bit floating-point model to an 8-bit or 4-bit integer model often results in accuracy loss. In this talk, we present the AI Model Efficiency Toolkit (AIMET), an open-source library that provides advanced quantization and compression techniques for trained neural network models. We also present the latest addition to it called AIMET Model Zoo. Together with the models, AIMET Model Zoo also provides the recipe for quantizing popular 32-bit floating point (FP32) models to 8-bit integer (INT8) models with little loss in accuracy. The tested and verified recipes include a script that optimizes TensorFlow or PyTorch models across a broad range of categories from image classification, object detection, semantic segmentation, and pose estimation to super resolution, and speech recognition.

Chat is not available.