The Art of the Squeeze: Porting LZ4 Compression to Legacy 8-Bit Silicon

Table of Contents
The Constraint of the Cartridge
In the world of modern software, memory is often treated as an infinite resource. But for those operating in the realm of retro-computing and homebrew development, every single byte is a hard-won victory. This fundamental tension is what led to a recent technical exploration into porting LZ4—a high-speed compression algorithm designed in 2012—onto processors that predate the modern internet by decades.
The project began as a necessity for a Super Nintendo (SNES) project, where cartridge space was at a premium. By implementing an LZ4 decompressor, it became possible to fit significantly more data into a limited ROM. However, what started as a specific fix for the SNES’s 65816 processor evolved into a broader study of how different legacy architectures—specifically the Zilog Z80, MOS 6502, and Intel 8080—handle the specific demands of LZ4.
How LZ4 Bridges the Gap
At its core, LZ4 belongs to the LZ77 family of compression. Unlike complex algorithms that require massive look-up tables or heavy floating-point math, LZ4 operates on a relatively simple principle: it treats a compressed stream as a series of alternating literal strings and backreferenced ranges.
A typical sequence starts with a single byte that defines the length of both the literal string and the subsequent backreference. If a piece of data repeats itself earlier in the stream, the algorithm simply tells the CPU to “go back X bytes and copy Y amount of data.” This approach is computationally inexpensive, making it an ideal candidate for CPUs with limited clock speeds and small register sets.
For 8-bit processors, the beauty of LZ4 lies in its lack of complex state management. Because the algorithm relies on direct copies and simple offsets, it avoids the need for the heavy RAM overhead that usually kills compression attempts on vintage hardware.
Architectural Divergence: Z80 vs. 6502
The implementation process reveals a fascinating divide in how these legacy chips function. The Zilog Z80, for instance, proved to be exceptionally well-suited for LZ4. Its instruction set allows for a very straightforward translation of the algorithm’s requirements, serving as a blueprint for ports to the Intel 8080 and the 8086.
The MOS 6502, however, requires a completely different philosophy. While the Z80 can lean on its robust register set, the 6502 is famously register-poor, forcing the programmer to rely heavily on the zero page for temporary storage. The resulting code isn’t just a translation; it’s a structural reimagining of the decompression loop to accommodate the 6502’s unique memory addressing modes.
Simplifying the Specification
To make these ports viable, certain shortcuts were taken by stripping away the standard LZ4 frame data. In a professional production environment, the frame format signals the end of a data block by pre-declaring its size. For these legacy implementations, that overhead is discarded in favor of a simpler method: null-terminating the compressed data with a pair of zero bytes.
This modification allows the decoder to use a “zero offset” as the signal to stop decompressing. While this deviates from the strict LZ4 specification, it drastically reduces the amount of state the decoder needs to track, which is a critical trade-off when working with processors that have only a few bytes of available scratchpad memory.
Ultimately, the experiment demonstrates that even a modern algorithm can find a home on 40-year-old silicon, provided the implementer is willing to trade strict specification adherence for architectural efficiency.