Benchmarks on RV64GC RISC-V Out-of-Order Simulator
Since the RISC-V Out-of-Order core NaxRiscv now supports RV[32|64]GC, we have created an RV64GC simulator and ran the benchmarks CoreMark, Dhrystone, and Whetstone.
Click here for NaxRiscv related articles.
NaxRiscv
NaxRiscv is an out-of-order execution superscalar RISC-V core that supports RV[32|64]IMAFDCSU.
See the previous article for an overview of NaxRiscv and NaxRiscv documentation for more details.
The performance of the default RV32IMA and RV64IMA is as follows.
RV32IMA
- CoreMark: 5.00 CoreMark/MHz (-O3 and so many more random flags)
- Dhrystone: 2.94 DMIPS/MHz (-O3 -fno-common -fno-inline)
RV64IMA
- CoreMark: 4.91 CoreMark/MHz (-O3, u32 as s32 and so many more random flags)
- Dhrystone: 2.97 DMIPS/MHz (-O3 -fno-common -fno-inline)
Benchmarks on NaxRiscv RV64GC Simulator
This time we have created a NaxRiscv RV64GC (RV64IMAFDC) simulator using Verilator and ran the benchmarks CoreMark, Dhrystone, and Whetstone ported to NaxRiscv.
CoreMark
The following shows the console output when running CoreMark.
$ ./sim/VNaxRiscv64gc --name coremark \ --load-elf $NAXSOFTWARE/baremetal/coremark/build/rv64imafdc/coremark.elf \ --pass-symbol pass 2K performance run parameters for coremark. CoreMark Size : 666 Total ticks : 2178284 Total time (secs): 2178284.000000 Iterations/Sec : 0.000005 Iterations : 10 Compiler version : GCC11.1.0 Compiler flags : -DPERFORMANCE_RUN=1 -march=rv64imafdc -mabi=lp64d -mcmodel=medany -Wno-pointer-to-int-cast -Wno-int-to-pointer-cast -I../driver -O3 -fno-common -funroll-loops -finline-functions -falign-functions=16 -falign-jumps=4 -falign-loops=4 -finline-limit=1000 -fno-if-conversion2 -fselective-scheduling -fno-crossjumping -freorder-blocks-and-partition -DCORE_DEBUG=0 -lgcc -lc -nostartfiles -ffreestanding -Wl,-Bstatic,-T,../common/app.ld,-Map,coremark.map,--print-memory-usage Memory location : STACK seedcrc : 0xe9f5 [0]crclist : 0xe714 [0]crcmatrix : 0x1fd7 [0]crcstate : 0x8e3a [0]crcfinal : 0xfcaf Correct operation validated. See README.md for run and reporting rules. CoreMark 1.0 : 0.000005 / GCC11.1.0 -DPERFORMANCE_RUN=1 -march=rv64imafdc -mabi=lp64d -mcmodel=medany -Wno-pointer-to-int-cast -Wno-int-to-pointer-cast -I../driver -O3 -fno-common -funroll-loops -finline-functions -falign-functions=16 -falign-jumps=4 -falign-loops=4 -finline-limit=1000 -fno-if-conversion2 -fselective-scheduling -fno-crossjumping -freorder-blocks-and-partition -DCORE_DEBUG=0 -lgcc -lc -nostartfiles -ffreestanding -Wl,-Bstatic,-T,../common/app.ld,-Map,coremark.map,--print-memory-usage / STACK 4.59 CoreMark/MHz SUCCESS coremark
The CoreMark/MHz of the RV64GC simulator is 4.59. As mentioned above, RV64IMA has 4.91 CoreMark/MHz, so the score has dropped by about 7%.
Dhrystone
The following shows the console output when running Dhrystone.
$ ./sim/VNaxRiscv64gc --name dhrystone \ --load-elf $NAXSOFTWARE/baremetal/dhrystone/build/rv64imafdc/dhrystone.elf \ --pass-symbol pass Dhrystone Benchmark, Version C, Version 2.2 Program compiled without 'register' attribute Using time(), HZ=12000000 ... Microseconds for one run through Dhrystone: 16 Dhrystones per Second: 62168 User_Time : 965124 Number_Of_Runs : 5000 HZ : 12000000 DMIPS per MHz: 2.94 SUCCESS dhrystone
The DMIPS/MHz of the RV64GC simulator is 2.94. As mentioned above, RV64IMA has 2.97 DMIPS/MHz, so the score is almost the same.
Whetstone
The following shows the console output when running the newly ported Whetstone.
$ ./sim/VNaxRiscv64gc --name whetstone \ --load-elf $NAXSOFTWARE/baremetal/whetstone/build/rv64imafdc/whetstone.elf \ --pass-symbol pass Loops: 10, Iterations: 1, Duration: 1023732 cycles. C Converted Double Precision Whetstones: 976 KIPS/MHz SUCCESS whetstone
The WMIPS/MHz of the RV64GC simulator is 0.976.
Summary
We have created a NaxRiscv RV64GC simulator using Verilator and ran the benchmarks CoreMark, Dhrystone, and Whetstone.
The performance of the RV64GC is as follows.
RV64GC
- CoreMark: 4.59 CoreMark/MHz (-O3, u32 as s32 and so many more random flags)
- Dhrystone: 2.94 DMIPS/MHz (-O3 -fno-common -fno-inline)
- Whetstone: 0.976 WMIPS/MHz (-O3 -fno-common -fno-inline)