Running ONNX Model on FPGA with Gemmini SoC

systolic-array

Luffcaでは、FPGAボード上にDNNアクセラレータのGemminiとRISC-V CPUのRocketを実装し、ONNXモデルを実行することに成功しました。

関連記事は、こちら。

Gemmini

Gemminiは、アジャイルRISC-V SoCデザインフレームワークのChipyardに含まれるRTLジェネレータの一つで、シストリックアレイ方式のDNNアクセラレータを生成することができます。

Gemminiリポジトリから引用した下図が、Gemminiシステムの概要を示しています。

gemmini-system

ONNX

Open Neural Network Exchange(ONNX)は、ニューラルネットワークモデルのフォーマットです。ONNX Runtimeは、ONNXモデル(ONNXフォーマットのモデル)の推論を行うためのソフトウェアです。

今回は、ONNX RuntimeのGemmini用のポーティングであるonnxruntime-riscvを使用しています。

Running ONNX Model ResNet-50 on FPGA

以前の記事で紹介したDigilent社のFPGAボードのNexys Video上に構築したGemminiシステムを使用しました。
Gemmini SoCのゲートウェアをFPGAボードにロードし、ONNXモデルのResNet-50を実行しました。

以下は、ONNXモデルとしてresnet50_opt_quant.onnxを指定し、ort_testを実行したときのコンソール出力を示しています。

# ./ort_test -m resnet50_opt_quant.onnx \
  -i images/dog.jpg \
  -p caffe2 -x 2 -O 99
Loaded runner program
Using systolic in mode 2
Using Onnxruntime C++ API
Number of inputs = 1
Input 0 : name=gpu_0/data_0, type=1, num_dims=4: [1, 3, 224, 224, ]
Number of outputs = 1
Output 0 : name=gpu_0/softmax_1, type=1, num_dims=2: [1, 1000, ]
Loading image
Image dimensions: 224 224 3
First few image values 130.061005 126.060997 123.060997
Called into systolic conv
Using systolic pooling
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic add
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic add
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic add
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic add
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic add
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic add
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic add
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic add
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic add
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic add
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic add
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic add
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic add
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic add
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic add
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic add
Element count 1000. Top 5 classes:
0.031456 giant schnauzer
0.075702 curly-coated retriever
0.087432 Great Dane
0.271946 Labrador retriever
0.361813 Rottweiler
Done! Inference took 495827001 cycles

まとめ

Luffcaでは、Digilent社のNexys Video上にGemminiとRocketを実装し、ONNXモデルのResNet-50を実行することに成功しました。