Huffman coding on NMEA sentances
I’m impressed with how well Huffman encoding works on the very verbose, very repetitive, ASCII based NMEA GPS sentances. I hacked up a Python script that bakes a fixed dictionary from example data and a device side C++ encoder that encodes based on the dictionary. The encoder is 46 statements, uses ~10 bytes of RAM, and still gets almost 3:1 compression.
For comparison, on my 135,548 byte test file:
- Treating each character as a symbol gives 58,749 B (2.30x)
- Treating the talker (‘GPGGA’), and each non-numeric field as a symbol gives 46,104 B (2.94x)
- lzop gives 22,161 B (6.12x)
- gzip gives 12,167 B (11.2x)
- lzop gives 22,161 B (6.12x)
- Treating the talker (‘GPGGA’), and each non-numeric field as a symbol gives 46,104 B (2.94x)