Benchmarks in LiteX/Rocket on FPGA boards
We measured a performance of the multi-core 64-bit Rocket Chip SoCs introduced in the previous article using the benchmark CoreMark.
We ran CoreMark on two FPGA boards, a Qmtech Wukong board and a Digilent Nexys Video.
The SoCs for Wukong board and Nexys Video are dual-core and quad-core, respectively.
We also use Linux as the OS.
CoreMark on Wukong board
The SoC for Wukong board is a dual-core 64-bit Rocket Chip.
The table below shows the benchmark results.
No. of Thread | CoreMark | CoreMark/MHz | Speedup factor |
---|---|---|---|
1 | 107.0 | 2.14 | 1.00 |
2 | 208.8 | 4.18 | 1.95 |
The CoreMark/MHz for single thread is 2.14.
Rocket CoreMark/MHz is known as 2.32, so this single-threaded result (OS: Linux, -O2
option) has 92% performance.
By the way, the program with the option changed from -O2
to -O3 -funroll-loops
gives 95% performance.
Also, the CoreMark/MHz of 2 threads is 4.18.
Since the speedup factor of 2 threads is 1.95, it can be seen that the effect of dual-core is obtained.
The following shows the output when executing CoreMark with 2 threads.
2K performance run parameters for coremark. CoreMark Size : 666 Total ticks : 19151 Total time (secs): 19.151000 Iterations/Sec : 208.866378 Iterations : 4000 Compiler version : GCC10.2.0 Compiler flags : -O2 -DMULTITHREAD=2 -DUSE_PTHREAD -pthread -DPERFORMANCE_RUN=1 -lrt Parallel PThreads : 2 Memory location : Please put data memory location here (e.g. code in flash, data on heap etc) seedcrc : 0xe9f5 [0]crclist : 0xe714 [1]crclist : 0xe714 [0]crcmatrix : 0x1fd7 [1]crcmatrix : 0x1fd7 [0]crcstate : 0x8e3a [1]crcstate : 0x8e3a [0]crcfinal : 0x4983 [1]crcfinal : 0x4983 Correct operation validated. See readme.txt for run and reporting rules. CoreMark 1.0 : 208.866378 / GCC10.2.0 -O2 -DMULTITHREAD=2 -DUSE_PTHREAD -pthread -DPERFORMANCE_RUN=1 -lrt / Heap / 2:PThreads
CoreMark on Nexys Video
The SoC for Nexys Video is a quad-core 64-bit Rocket Chip.
The featured image above and the table below show the benchmark results.
No. of Thread | CoreMark | CoreMark/MHz | Speedup factor |
---|---|---|---|
1 | 106.2 | 2.12 | 1.00 |
2 | 210.1 | 4.20 | 1.98 |
3 | 311.6 | 6.23 | 2.94 |
4 | 415.4 | 8.31 | 3.92 |
The CoreMark/MHz for single thread is 2.12, which is about the same as that on the Wukong board.
Also, the CoreMark/MHz of 4 threads is 8.31.
Therefore, the speedup factor of 4 threads is 3.92.
From this, it seems that there is no element that becomes a bottleneck up to 4 threads.
The following shows the output when executing CoreMark with 4 threads.
2K performance run parameters for coremark. CoreMark Size : 666 Total ticks : 19258 Total time (secs): 19.258000 Iterations/Sec : 415.411777 Iterations : 8000 Compiler version : GCC10.2.0 Compiler flags : -O2 -DMULTITHREAD=4 -DUSE_PTHREAD -pthread -DPERFORMANCE_RUN=1 -lrt Parallel PThreads : 4 Memory location : Please put data memory location here (e.g. code in flash, data on heap etc) seedcrc : 0xe9f5 [0]crclist : 0xe714 [1]crclist : 0xe714 [2]crclist : 0xe714 [3]crclist : 0xe714 [0]crcmatrix : 0x1fd7 [1]crcmatrix : 0x1fd7 [2]crcmatrix : 0x1fd7 [3]crcmatrix : 0x1fd7 [0]crcstate : 0x8e3a [1]crcstate : 0x8e3a [2]crcstate : 0x8e3a [3]crcstate : 0x8e3a [0]crcfinal : 0x4983 [1]crcfinal : 0x4983 [2]crcfinal : 0x4983 [3]crcfinal : 0x4983 Correct operation validated. See readme.txt for run and reporting rules. CoreMark 1.0 : 415.411777 / GCC10.2.0 -O2 -DMULTITHREAD=4 -DUSE_PTHREAD -pthread -DPERFORMANCE_RUN=1 -lrt / Heap / 4:PThreads
Summary
We measured the performance of multi-core 64-bit Rocket Chip SoCs introduced in the previous article using the benchmark CoreMark.
We found that the speedup factors for the dual-core SoC for Wukong board and the quad-core SoC for Nexys Video are 1.95 and 3.92, respectively.