Metadata-Version: 2.1
Name: aimet-torch
Version: 1.35.0
Summary: AIMET torch Package
Home-page: https://quic.github.io/aimet-pages/index.html
Author: Qualcomm Innovation Center, Inc.
Author-email: aimet.os@quicinc.com
License: NOTICE.txt
Platform: x86
Requires-Python: >=3.8, !=3.9.*, <3.11
Description-Content-Type: text/markdown
Requires-Dist: bokeh==3.2.2
Requires-Dist: cumm-cu120
Requires-Dist: cvxpy
Requires-Dist: cylp
Requires-Dist: dataclasses
Requires-Dist: h5py
Requires-Dist: holoviews==1.18.3
Requires-Dist: hvplot==0.9.2
Requires-Dist: Jinja2==3.0.3
Requires-Dist: jsonschema
Requires-Dist: matplotlib>=3
Requires-Dist: networkx
Requires-Dist: numpy<1.24,>=1.20.5
Requires-Dist: onnx==1.14.1
Requires-Dist: onnxscript
Requires-Dist: onnxsim
Requires-Dist: osqp
Requires-Dist: pandas==1.5.3
Requires-Dist: progressbar2
Requires-Dist: protobuf==3.20.2
Requires-Dist: psutil
Requires-Dist: pybind11
Requires-Dist: PyYAML
Requires-Dist: scikit-learn==1.1.3
Requires-Dist: scipy==1.8.1
Requires-Dist: setuptools
Requires-Dist: spconv-cu120
Requires-Dist: tensorboardX==2.4
Requires-Dist: torch==2.2.2
Requires-Dist: torchvision==0.17.2
Requires-Dist: tqdm

# AI Model Efficiency Toolkit (AIMET) for torch

[[**GitHub**](https://github.com/quic/aimet)]
[[**Docs**](https://quic.github.io/aimet-pages/releases/latest/user_guide/index.html)]
[[**Forums**](https://forums.quicinc.com)]

[AIMET](https://quic.github.io/aimet-pages/index.html) is a library that provides advanced model quantization and compression techniques for trained neural network models. It provides features that have been proven to improve run-time performance of deep learning neural network models with  lower compute and memory requirements and minimal impact to task accuracy. AIMET is designed to work with [PyTorch](https://pytorch.org), [TensorFlow](https://tensorflow.org) and [ONNX](https://onnx.ai) models.

## Table of Contents
- [Requirements](#requirements)
- [Why AIMET?](#why-aimet)
- [Supported features](#supported-features)
- [What's New](#whats-new)  
- [Results](#results)
- [Resources](#resources)
- [Contributions](#contributions)
- [Team](#team)
- [License](#license)

## Requirements
This package supports the following environment:
- 64-bit Intel x86-compatible processor
- Linux Ubuntu: (22.04 LTS with Python 3.10) or (20.04 LTS with Python 3.8)
- torch 2.2.2

## Why AIMET?
- **Supports advanced quantization techniques**: Inference using integer runtimes is significantly faster than using floating-point runtimes. For example, models run
5x-15x faster on the Qualcomm Hexagon DSP than on the Qualcomm Kyro CPU. In addition, 8-bit precision models have a 4x smaller footprint than 32-bit precision models. However, maintaining model accuracy when quantizing ML models is often challenging.  AIMET solves this using novel techniques like Data-Free Quantization that provide state-of-the-art INT8 results on several popular models. 
- **Supports advanced model compression techniques** that enable models to run faster at inference-time and require less memory
- **AIMET is designed to automate optimization** of neural networks avoiding time-consuming and tedious manual tweaking. AIMET also provides user-friendly APIs that allow users to make calls directly from their [TensorFlow](https://tensorflow.org) or [PyTorch](https://pytorch.org) or [ONNX](https://onnx.ai) pipelines.

Please visit the [AIMET on Github Pages](https://quic.github.io/aimet-pages/index.html) for more details.

## Supported Features
### Quantization
- *Cross-Layer Equalization*: Equalize weight tensors to reduce amplitude variation across channels
- *Bias Correction*: Corrects shift in layer outputs introduced due to quantization
- *Adaptive Rounding*: Learn the optimal rounding given unlabelled data
- *Quantization Simulation*: Simulate on-target quantized inference accuracy
- *Quantization-aware Training*: Use quantization simulation to train the model further to improve accuracy

### Model Compression
- *Spatial SVD*: Tensor decomposition technique to split a large layer into two smaller ones
- *Channel Pruning*: Removes redundant input channels from a layer and reconstructs layer weights
- *Per-layer compression-ratio selection*: Automatically selects how much to compress each layer in the model

### Visualization
- *Weight ranges*: Inspect visually if a model is a candidate for applying the Cross Layer Equalization technique. And the effect after applying the technique
- *Per-layer compression sensitivity*: Visually get feedback about the sensitivity of any given layer in the model to compression

## What's New
Some recently added features include
- Adaptive Rounding (AdaRound): Learn the optimal rounding given unlabelled data
- Quantization-aware Training (QAT) for recurrent models (including with RNNs, LSTMs and GRUs)

## Results
AIMET can quantize an existing 32-bit floating-point model to an 8-bit fixed-point model without sacrificing much accuracy and without model fine-tuning. 

### DFQ
The DFQ method applied to several popular networks, such as MobileNet-v2 and ResNet-50, result in less than 0.9% loss in accuracy all the way down to 8-bit quantization, in an automated way without any training data.

| Models | FP32 | INT8 Simulation |
| :----: | :--: | :-------------: |
| MobileNet v2 (top1) | 71.72 % | 71.08 % |
| ResNet 50 (top1) | 76.05 % | 75.45 % |
| DeepLab v3 (mIOU) | 72.65 % | 71.91 % |

### AdaRound (Adaptive Rounding)
#### ADAS Object Detect
For this example ADAS object detection model, which was challenging to quantize to 8-bit precision, AdaRound can recover the accuracy to within 1% of the FP32 accuracy.

| Configuration | mAP <br>(Mean Average Precision) |
| :-----------: | :--------------------------: |
| FP32 | 82.20 % |
| Nearest Rounding (INT8 weights, INT8 acts) | 49.85 % |
| AdaRound (INT8 weights, INT8 acts) | 81.21 % |

#### DeepLabv3 Semantic Segmentation
For some models like the DeepLabv3 semantic segmentation model, AdaRound can even quantize the model weights to 4-bit precision without a significant drop in accuracy.

| Configuration | mIOU <br>(Mean intersection over union) |
| :-----------: | :--------------------------: |
| FP32 | 72.94 % |
| Nearest Rounding (INT8 weights, INT8 acts) | 6.09 % |
| AdaRound (INT8 weights, INT8 acts) | 70.86 % |

### Quantization for Recurrent Models
AIMET supports quantization simulation and quantization-aware training (QAT) for recurrent models (RNN, LSTM, GRU). Using QAT feature in AIMET, a DeepSpeech2 model with bi-directional LSTMs can be quantized to 8-bit precision with minimal drop in accuracy.

| DeepSpeech2<br>(using bi-directional LSTMs) | Word Error Rate |
| :-----------: | :--------------------------: |
| FP32 | 9.92 % |
| INT8 | 10.22 % |

### Model Compression
AIMET can also significantly compress models. For popular models, such as Resnet-50 and Resnet-18, compression with spatial SVD plus channel pruning achieves 50% MAC (multiply-accumulate) reduction while retaining accuracy within approx. 1% of the original uncompressed model.

| Models | Uncompressed | 50% Compressed |
| :----: | :--: | :-------------: |
| ResNet18 (top1) | 69.76 % | 68.56 % |
| ResNet 50 (top1) | 76.05 % | 75.75 % |

## Resources
- [User Guide](https://quic.github.io/aimet-pages/releases/latest/user_guide/index.html)
- [API Docs](https://quic.github.io/aimet-pages/releases/latest/api_docs/index.html)
- [Discussion Forums](https://forums.quicinc.com/)
- [Tutorial Videos](https://quic.github.io/aimet-pages/index.html#video)
- [Example Code](https://github.com/quic/aimet/blob/develop/Examples/README.md)

## Contributions
Thanks for your interest in contributing to AIMET! Please read our [Contributions Page](https://github.com/quic/aimet/blob/develop/CONTRIBUTING.md) for more information on contributing features or bug fixes. We look forward to your participation!

## Team
AIMET aims to be a community-driven project maintained by Qualcomm Innovation Center, Inc.

## License
AIMET is licensed under the BSD 3-clause "New" or "Revised" License. Check out the [LICENSE](https://github.com/quic/aimet/blob/develop/LICENSE) for more details.


