Dead ends on emulator performance
My Z80 emulator now runs on the CH32V006 and achieves better than real time, so I’ve been looking at ways of improving the performance. So far most have been dead ends.
I’m using the zexdoc test suite as the benchmark. The core is a threaded
interpreter using computed gotos and is generated from a machine readable
version of the instruction set, making it simpler to try different methods.
I’ve tried:
- Clang vs GCC: GCC is ~2x faster
- Switching to a tail call interpreter: ~2x slower
- Lazily updating the flags: at most 40 % of flag updates are unused, which is not enough to recover the overhead
- Moving
Ato a variable: ~5 % faster - Moving
PCto a variable: ~10 % slower - Updating the
PCto host address calculation on line change: ~10 % slower
Interestingly, compiler hints like [[likely]] can reduce performance, as the
code is very sensitive to getting exactly the right control flow.
Next is to try splitting the carry flag C into a separate field. The tail call
interpreter is worth revisiting, as the RISC-V compiler still generated a
prologue and epilogue which seems fixable.
I used Claude Code for some of these experiments and it’s been decent. Most of the cases are some type of large refactoring and it’s about as fast as me. The code is good about ⅓rd of the time, and for most of the other times it’s been enough to try the idea and discard when it turns out to be a dead end.