TensorFlow Lite for Microcontrollers on RISC-V Out-of-Order Core
We have successfully run Google’s TensorFlow Lite for Microcontrollers on an FPGA board implementing NaxRiscv, a RISC-V Out-of-Order core.
Click here for NaxRiscv related articles.
TensorFlow Lite for Microcontrollers
The TensorFlow Lite for Microcontrollers (hereafter, TFLite Micro) repository is described as follows.
TensorFlow Lite for Microcontrollers is a port of TensorFlow Lite designed to run machine learning models on DSPs, microcontrollers and other devices with limited memory.
Simply put, TFLite Micro is a bare-metal version of TensorFlow Lite that runs without an OS.
NaxRiscv
NaxRiscv is an out-of-order execution superscalar RISC-V core. NaxRiscv is integrated into LiteX, an SoC builder. For an overview of NaxRiscv, see the related article Benchmarks on RISC-V Out-of-Order Simulator.
This time, we used the 32-bit NaxRiscv gateware for Digilent’s Nexys Video introduced in the article Running 32-bit Linux on FPGAs with RISC-V Out-of-Order Core.
TFLite Micro on FPGA with NaxRiscv
We loaded the 32-bit NaxRiscv gateway into the FPGA board Nexys Video and ran the TFLite Micro’s Keyword Spotting, Person Detection and MobileNetV2 models.
The featured image and the table below show the results compared to VexRiscv, an in-order execution scalar RISC-V core.
ML models | Mega cycles | Speedup factor |
|
---|---|---|---|
VexRiscv | NaxRiscv | ||
Keyword Spotting | 87 | 33 | 2.64 |
Person Detection | 215 | 79 | 2.72 |
MobileNetV2 | 1079 | 413 | 2.61 |
The speedup compared to VexRiscv is roughly 2.6x.
Summary
We have successfully run Google’s TFLite Micro on an FPGA board implementing NaxRiscv, an out-of-order execution superscalar RISC-V core. Compared to VexRiscv, an in-order execution scalar RISC-V core, the speedup is roughly 2.6x.