Skip to content

tenstorrent/pytorch2.0_ttnn

Repository files navigation

PyTorch 2.0 TTNN Compiler

This project allows to run PyTorch code on Tenstorrent hardware.

Supported Models

The table below summarizes the results of running various ML models through our TTNN compiler. For each model, we track whether the run was successful, the number of operations before and after conversion, the number of to_device and from_device operations, performance metrics, and accuracy.

Model Status Torch Ops Before (Unique Ops) Torch Ops Remain (Unique Ops) To/From Device Ops Original Run Time (ms) Compiled Run Time for 5th Iteration (ms) Accuracy (%)
Autoencoder (linear) 22 (3) 0 (0) 0 1362.91 17.72 100.0
BERT 1393 (21) 0 (0) 0 88060.8 4492.54 99.69
Bloom 1403 (26) 0 (0) 0 76121.1 7958.45 39.93
DPR 720 (22) 0 (0) 3 6668.54 1047.03 99.29
Llama 40 (11) 0 (0) 2 325905 130298.08 100.0
MLPMixer 253 (11) 0 (0) 0 5584.04 526.77 99.97
Mnist 14 (8) 0 (0) 1 3960.43 32.8 99.03
MobileNetV2 154 (9) 0 (0) 0 1134.72 1982.59 45.49
OpenPose V2 155 (7) 0 (0) 6 2826.56 1349.03 91.47
Perceiver IO 1531 (20) 0 (0) 1 53274.9 4240.45 99.95
ResNet18 70 (9) 0 (0) 1 1969.81 627.22 30.71
ResNet50 176 (9) 0 (0) 1 5222.89 2185.4 4.56
RoBERTa 719 (21) 0 (0) 3 20092.4 3763.15 98.64
SqueezeBERT 16 (9) 0 (0) 3 3154.31 229.66 100.0
U-Net 68 (6) 0 (0) 12 63746 975.63 100.0
Unet-brain 68 (6) 0 (0) 12 63148.6 976.54 N/A
Unet-carvana 67 (5) 0 (0) 12 81166.7 2020.56 99.69
YOLOv5 3 (3) 0 (0) 0 23825.5 16419.1 100.0
albert/albert-base-v2 791 (21) 0 (0) 3 2746.83 493.73 68.8
albert/albert-base-v2-classification 779 (21) 0 (0) 3 2155.79 432.62 99.96
albert/albert-large-v2 1547 (21) 0 (0) 3 4336.42 938.41 24.89
albert/albert-xlarge-v2 1547 (21) 0 (0) 3 14475.1 1459.91 52.29
distilbert-base-uncased 361 (16) 0 (0) 2 8761.99 520.33 99.7
dla34.in1k 135 (9) 0 (0) 23 6424.16 1130.72 8.94
ghostnet_100.in1k 515 (14) 0 (0) 64 983.16 2067.01 23.36
mobilenet_v2 154 (9) 0 (0) 0 827.86 2020.07 96.81
mobilenet_v3_large 188 (11) 0 (0) 0 774.52 2454.29 -9.92
mobilenet_v3_small 158 (11) 0 (0) 0 490.71 1179.29 9.17
mobilenetv1_100.ra4_e3600_r224_in1k 85 (7) 0 (0) 0 1402.92 1348.58 -6.46
regnet_x_16gf 235 (8) 0 (0) 0 14156.4 6031.61 19.64
regnet_x_1_6gf 195 (8) 0 (0) 0 1849.69 2789.38 14.69
regnet_x_32gf 245 (8) 0 (0) 0 27546.1 11726.18 12.48
regnet_x_3_2gf 265 (8) 0 (0) 0 3222.18 4106.16 5.15
regnet_x_400mf 235 (8) 0 (0) 0 1857.15 1758.48 0.4
regnet_x_800mf 175 (8) 0 (0) 0 1167.94 1806.26 2.32
regnet_x_8gf 245 (8) 0 (0) 0 7381.06 7316.37 -0.21
regnet_y_1_6gf 447 (10) 0 (0) 0 2379.09 3599.33 3.4
regnet_y_32gf 335 (10) 0 (0) 0 29560.6 16584.7 5.41
regnet_y_3_2gf 351 (10) 0 (0) 0 4098.74 4412.48 -3.85
regnet_y_400mf 271 (10) 0 (0) 0 814.46 1777.09 15.81
regnet_y_800mf 239 (10) 0 (0) 0 1381.68 1882.47 0.12
regnet_y_8gf 287 (10) 0 (0) 0 9918.36 5962.37 -2.99
resnet101 346 (9) 0 (0) 1 7590.73 3882.36 5.82
resnet152 516 (9) 0 (0) 1 13632.6 5774.42 -0.28
resnet18 70 (9) 0 (0) 1 2243.38 622.99 14.68
resnet34 126 (9) 0 (0) 1 4083 1061.87 21.92
resnet50 176 (9) 0 (0) 1 4346.98 2189.19 4.56
resnext101_32x8d 346 (9) 0 (0) 1 18790.2 13666.47 0.38
resnext101_64x4d 346 (9) 0 (0) 1 17742.2 12701.12 5.13
resnext50_32x4d 176 (9) 0 (0) 1 5253.29 3301.45 7.63
textattack/albert-base-v2-imdb 782 (22) 0 (0) 3 4385.91 443.44 100.0
tf_efficientnet_lite0.in1k 149 (9) 0 (0) 5 1579.64 4138.09 20.18
tf_efficientnet_lite1.in1k 194 (9) 0 (0) 5 1660.44 5199.11 54.38
tf_efficientnet_lite2.in1k 194 (9) 0 (0) 5 2296.27 7283.13 -0.18
twmkn9/albert-base-v2-squad2 783 (23) 0 (0) 3 3140.75 415.53 98.39
vgg11 33 (8) 0 (0) 5 11890.8 1386.13 99.8
vgg11_bn 41 (9) 0 (0) 5 11423 1476.21 99.3
vgg13 37 (8) 0 (0) 5 18949.1 1517.77 99.88
vgg13_bn 47 (9) 0 (0) 5 18631.3 1659.13 99.05
vgg16 43 (8) 0 (0) 5 21547.2 1567.67 99.7
vgg16_bn 56 (9) 0 (0) 5 22853.5 1692.37 98.21
vgg19 49 (8) 0 (0) 5 26739.4 1690.22 99.52
vgg19_bn 65 (9) 0 (0) 5 23445.6 1838.48 97.44
wide_resnet101_2 346 (9) 0 (0) 1 23664.7 6032.23 -3.76
wide_resnet50_2 176 (9) 0 (0) 1 12635.1 3215.91 5.52
xception71.tf_in1k 393 (9) 0 (0) 0 17377.4 15795.78 4.21
Autoencoder (conv) 🚧 9 (3) 1 (1) 1 1316.04 26.43 100.0
Autoencoder (conv)-train 🚧 24 (7) 11 (4) 0 2074.72 22.07 100.0
Autoencoder (linear)-train 🚧 104 (8) 14 (2) 0 2032 55.66 100.0
CLIP 🚧 1395 (29) 7 (6) 5 4796.97 1953.53 94.18
DETR 🚧 1646 (35) 24 (3) 3 121646 22410.9 46.57
Falcon 🚧 71 (6) 1 (1) 3 146624 35001.21 100.0
GLPN-KITTI 🚧 2959 (26) 22 (2) 6 124120 93818.75 99.74
GPT-2 🚧 745 (29) 15 (4) 2 16702.3 984.09 100.0
Hand Landmark 🚧 N/A N/A N/A 7759.89 86.79 N/A
HardNet 🚧 245 (10) 2 (1) 122 5595.54 2087.59 5.37
HardNet-train 🚧 867 (21) 480 (11) 120 15238.1 12171.11 100.0
MLPMixer-train 🚧 616 (19) 101 (6) 0 15079.5 8534.53 100.0
Mnist-train 🚧 46 (15) 10 (6) 0 4153.51 78.42 100.0
MobileNetSSD 🚧 444 (26) 5 (1) 32 718.03 3436.48 23.55
OpenPose V2-train 🚧 523 (14) 279 (8) 6 9781.98 7505.37 100.0
ResNet18-train 🚧 241 (19) 142 (11) 0 5727.23 4421.45 100.0
ResNet50-train 🚧 616 (19) 372 (11) 0 14318.3 12208.67 100.0
SegFormer 🚧 676 (22) 16 (1) 4 39743.4 3870.21 99.49
SegFormer-train 🚧 1780 (35) 157 (13) 4 82780.1 40673.36 100.0
U-Net-train 🚧 236 (15) 140 (9) 8 113059 52131.34 100.0
Unet-brain-train 🚧 236 (15) 140 (9) 8 115081 50513.9 100.0
Unet-carvana-train 🚧 232 (13) 139 (8) 8 171230 90014.08 100.0
ViLT 🚧 42 (16) 8 (6) 3 27332.5 13857.09 87.8
XGLM 🚧 1432 (28) 26 (3) 2 20155.3 6394.59 95.48
YOLOS 🚧 952 (27) 17 (2) 6 14807.8 7531.91 97.52
YOLOv3 🚧 250 (7) 2 (1) 4 231694 3835.24 98.74
albert/albert-xxlarge-v2 🚧 791 (21) 24 (1) 3 36116.8 2198.47 22.25
densenet121 🚧 432 (10) 3 (1) 594 2849.35 2691.69 18.16
densenet161 🚧 572 (10) 3 (1) 1144 7308.25 5841.3 18.66
densenet169 🚧 600 (10) 3 (1) 1238 3362.48 4396.65 16.89
densenet201 🚧 712 (10) 3 (1) 1902 4936.39 6515.17 34.88
dla34.in1k-train 🚧 469 (18) 268 (10) 17 10133.3 6977.79 100.0
ese_vovnet19b_dw.ra_in1k 🚧 111 (12) 3 (1) 16 2523.23 1101.73 34.46
ese_vovnet19b_dw.ra_in1k-train 🚧 360 (25) 181 (11) 16 5986.76 4805.66 100.0
facebook/deit-base-patch16-224 🚧 685 (17) 1 (1) 2 20860.6 2321.78 98.19
facebook/deit-base-patch16-224-train 🚧 1854 (27) 127 (8) 2 74307.7 6188.42 100.0
ghostnet_100.in1k-train 🚧 1468 (33) 713 (16) 64 4829.95 3139.52 100.0
ghostnetv2_100.in1k 🚧 683 (18) 28 (2) 64 1784.34 3894.49 4.61
ghostnetv2_100.in1k-train 🚧 2000 (39) 1049 (21) 64 5522.18 5390.98 100.0
googlenet 🚧 214 (15) 13 (1) 39 1875.24 1030.67 21.45
hrnet_w18.ms_aug_in1k 🚧 1209 (11) 31 (1) 0 5365.03 5466.61 8.39
hrnet_w18.ms_aug_in1k-train 🚧 3998 (21) 2299 (11) 0 12716.8 12285.57 100.0
inception_v4.tf_in1k 🚧 495 (11) 15 (2) 83 12739.5 5052.41 0.74
inception_v4.tf_in1k-train 🚧 1702 (24) 933 (12) 80 36812.1 26820.89 100.0
microsoft/beit-base-patch16-224 🚧 793 (21) 13 (2) 2 13917.2 2783.18 98.95
microsoft/beit-base-patch16-224-train 🚧 2229 (34) 165 (11) 2 81761.1 7091.33 100.0
microsoft/beit-large-patch16-224 🚧 1573 (21) 25 (2) 2 39886.8 7234.97 99.24
microsoft/beit-large-patch16-224-train 🚧 4437 (34) 321 (11) 2 411216 17842.89 100.0
mixer_b16_224.goog_in21k 🚧 356 (11) 1 (1) 0 18708.6 1547.18 45.52
mixer_b16_224.goog_in21k-train 🚧 959 (18) 101 (6) 0 61524.5 4190.5 100.0
mobilenetv1_100.ra4_e3600_r224_in1k-train 🚧 231 (15) 165 (8) 0 3537.68 3221.66 100.0
regnet_y_128gf 🚧 447 (10) 3 (1) 0 497505 132543.9 37.45
regnet_y_16gf 🚧 303 (10) 1 (1) 0 14198.3 11731.9 17.67
retinanet_resnet50_fpn 🚧 973 (24) 42 (9) 16 2699.67 14109.16 N/A
retinanet_resnet50_fpn_v2 🚧 483 (25) 82 (10) 16 2707.95 16081.19 N/A
speecht5-tts 🚧 860 (20) 1 (1) 2 58700.9 42036.57 N/A
ssd300_vgg16 🚧 248 (24) 7 (3) 36 3026.75 2749.17 N/A
ssdlite320_mobilenet_v3_large 🚧 444 (26) 5 (1) 32 547.52 2952.5 34.66
swin_b 🚧 1898 (30) 29 (2) 63 17901.1 4404.4 4.91
swin_s 🚧 1898 (30) 29 (2) 63 6758.61 3252.53 84.25
swin_t 🚧 968 (30) 23 (2) 39 3719.01 1806.33 89.92
swin_v2_b 🚧 2474 (37) 125 (4) 33 22671.2 5307.49 6.42
swin_v2_s 🚧 2474 (37) 125 (4) 33 13386.6 4144.94 2.62
swin_v2_t 🚧 1256 (37) 71 (4) 21 7795.03 2204.92 23.73
tf_efficientnet_lite0.in1k-train 🚧 403 (17) 286 (9) 5 3259.01 5532.3 100.0
tf_efficientnet_lite1.in1k-train 🚧 523 (17) 371 (9) 5 3890.74 7433.07 100.0
tf_efficientnet_lite2.in1k-train 🚧 523 (17) 371 (9) 5 4675.29 9884.23 100.0
tf_efficientnet_lite3.in1k 🚧 221 (9) 5 (1) 5 2792.58 11167.96 17.4
tf_efficientnet_lite3.in1k-train 🚧 595 (17) 427 (10) 5 7380.17 15814.03 100.0
tf_efficientnet_lite4.in1k 🚧 275 (9) 6 (1) 5 6154.57 22600.67 69.91
tf_efficientnet_lite4.in1k-train 🚧 739 (17) 530 (10) 5 14947 23781.89 100.0
vit_b_16 🚧 552 (17) 13 (2) 2 12442.7 5886.0 98.97
vit_b_32 🚧 552 (17) 13 (2) 2 4208.54 2092.33 98.45
vit_h_14 🚧 1452 (17) 33 (2) 2 760726 181878.28 98.96
vit_l_16 🚧 1092 (17) 25 (2) 2 42401.4 14518.52 99.69
vit_l_32 🚧 1092 (17) 25 (2) 2 13708 5859.22 98.87
xception71.tf_in1k-train 🚧 1370 (18) 945 (9) 0 68078.9 65273.01 100.0
CLIP-train 3942 (43) N/A N/A 37584 N/A N/A
FLAN-T5 20020 (38) N/A N/A 6011.92 N/A N/A
GPTNeo 2725 (35) N/A N/A 19014.5 N/A N/A
OPT 4001 (31) N/A N/A 26360.3 N/A N/A
Stable Diffusion V2 1870 (29) N/A N/A 853896 N/A N/A
Whisper 4286 (17) N/A N/A 281325 N/A N/A
codegen 9177 (36) N/A N/A 12239.4 N/A N/A
t5-base 14681 (38) N/A N/A 9601.65 N/A N/A
t5-large 22696 (38) N/A N/A 26765.4 N/A N/A
t5-small 6118 (38) N/A N/A 3992.77 N/A N/A

Explanation of Metrics

Model: Name of the model.
Status: Indicates whether the model is ❌ traced / 🚧 compiled / ✅ E2E on device.
Torch Ops Before (Unique Ops): The total number of operations used by the model in the original Torch implementation. The number in parenthesis represents the total unique ops.
Torch Ops Remain (Unique Ops): The total number of operations used after conversion to TTNN. The number in parenthesis represents the total unique ops.
To/From Device Ops: The number of to/from_device operations (data transfer to/from the device).
Original Run Time (ms): Execution time (in seconds) of the model before conversion.
Compiled Run Time for 5th Iteration (ms): Execution time (in seconds) of the model after conversion for the 5th iteration.
Accuracy (%): Model accuracy on a predefined test dataset after conversion.


Quickstart

The torch_ttnn module has a backend function, which can be used with the torch.compile().

import torch
import torch_ttnn

# A torch Module
class FooModule(torch.nn.Module):
    ...
# Create a module
module = FooModule()

# Compile the module, with ttnn backend
device = ttnn.open_device(device_id=0)
option = torch_ttnn.TorchTtnnOption(device=self.device)
ttnn_module = torch.compile(module, backend=torch_ttnn.backend, options=option)

# Running inference / training
ttnn_module(input_data)

Tracer

The tracer dump the information of fx graph such as node's op_name and shape.

For example, you can run this script to parse the information

PYTHONPATH=$(pwd) python3 tools/stat_models.py --trace_orig --backward --profile
ls stat/raw

By default, the raw result will be stored at stat/raw, and you can run this script to generate the report

python3 tools/generate_report.py
ls stat/

Now the stat/ folder have these report

  • fw_node_count.csv
  • bw_node_count.csv
  • fw_total_input_size_dist/
  • bw_total_input_size_dist/
  • fw_total_output_size_dist/
  • bw_total_output_size_dist/
  • profile/

The node_count.csv show the node with op_type appear in the fx graph. This report can help analyze the frequency of op type appear in the graph.

The *_total_*_size_dist/ statistics the op_type's input/output_size distribution from all fx graph recored in stat/raw. This report can help analyze the memory footprint durning the calculation of op_type.

  • Notice: the default input_shapes in tools/stat_torchvision.py is [1,3,224,224], which has dependency with *_total_*_size_dist/ report.

  • Notice: the aten ir interface is in there

The profile/ is the tools provided by pytorch, you can open it by the url: chrome://tracing

For developers

Install torch-ttnn with editable mode

During development, you may want to use the torch-ttnn package for testing. In order to do that, you can install the torch-ttnn package in "editable" mode with

pip install -e .

Now, you can utilize torch_ttnn in your Python code. Any modifications you make to the torch_ttnn package will take effect immediately, eliminating the need for constant reinstallation via pip.

Build wheel file

For developers want to deploy the wheel, you can build the wheel file with

python -m build

Then you can upload the .whl file to the PyPI (Python Package Index).

Run transformer models

To run transformer model with ttnn backend, run:

PYTHONPATH="$TT_METAL_HOME:$(pwd)" python3 tools/run_transformers.py --model "phiyodr/bert-large-finetuned-squad2" --backend torch_ttnn

You can also substitute the backend with torch_stat to run a reference comparison.

About

⭐️ TTNN Compiler for PyTorch 2.0 ⭐️ It enables running PyTorch2.0 models on Tenstorrent hardware

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages