Dead ends on emulator performance

My Z80 emulator now runs on the CH32V006 and achieves better than real time, so I’ve been looking at ways of improving the performance. So far most have been dead ends.

I’m using the zexdoc test suite as the benchmark. The core is a threaded interpreter using computed gotos and is generated from a machine readable version of the instruction set, making it simpler to try different methods.

I’ve tried:

  • Clang vs GCC: GCC is ~2x faster
  • Switching to a tail call interpreter: ~2x slower
  • Lazily updating the flags: at most 40 % of flag updates are unused, which is not enough to recover the overhead
  • Moving A to a variable: ~5 % faster
  • Moving PC to a variable: ~10 % slower
  • Updating the PC to host address calculation on line change: ~10 % slower

Interestingly, compiler hints like [[likely]] can reduce performance, as the code is very sensitive to getting exactly the right control flow.

Next is to try splitting the carry flag C into a separate field. The tail call interpreter is worth revisiting, as the RISC-V compiler still generated a prologue and epilogue which seems fixable.

I used Claude Code for some of these experiments and it’s been decent. Most of the cases are some type of large refactoring and it’s about as fast as me. The code is good about ⅓rd of the time, and for most of the other times it’s been enough to try the idea and discard when it turns out to be a dead end.

Avatar
Michael Hope
Software Engineer