Extract any LZH file

Unlimited jobs. File sizes up to 2.5 GB. For free, forever.

All local

Our converter runs in your browser, so we never see your data.

Blazing fast

No uploading your files to a server—conversions start instantly.

Secure by default

Unlike other converters, your files are never uploaded to us.

What is the LZH format?

LZH Archive

LZH is a lossless data compression archive format named after Lempel–Ziv–Huffman, the algorithms on which it is based. It was first released in 1987 by Haruyasu Yoshizaki as an improvement over the earlier LZ77 and LZ78 compression formats developed by Abraham Lempel and Jacob Ziv in the late 1970s. LZH provided more efficient compression ratios while still allowing fast decompression.

The core compression algorithms used in LZH are dictionary-based, leveraging previously seen data to more compactly encode future data. The encoder maintains a sliding window buffer of the most recently processed data. When new data is encountered, the encoder searches for the longest matching sequence in the sliding window. If a match is found, the data is encoded as a reference to the matching window position and length, rather than the literal data. This reference typically consumes less space than the original content.

LZ77, used as a basis for LZH, has an encoding loop that looks like this: 1. Search the sliding window for the longest match with upcoming input 2. Output (offset , length) pair that refers to the match 3. If no match found, output a literal byte 4. Move the window forward by the match length

In 1986, Terry Welch published an improved LZW algorithm that adapted work from LZ78. It used index numbers into a dictionary of strings to achieve more density than LZ77. The next year, LZH was released, incorporating the LZ77 sliding dictionary approach but adding Huffman coding on the LZ77 symbols as an extra step to improve compression ratios.

Huffman coding assigns short bit sequences to frequently used symbols and longer sequences to rare ones. In LZH, the possible symbols are literal bytes, end-of-block markers, and match references into the sliding window dictionary. The Huffman coding model is computed uniquely for each block of data based on that block's symbol frequency distribution. More frequent match references are assigned shorter bit codes. This entropy encoding step is applied after matching against the sliding window.

Decompression parses the variable-length Huffman codes from the input stream, translating them back to literal bytes and match references. References are resolved by looking back into the window buffer at the decoded data, copying the match to the output. The window is slid forward after each symbol. Decompression is fast, as no searching for matches is required.

LZH files contain a series of blocks, each independently compressed with this LZ77+Huffman model. Separate blocks allow random access and error recovery. Each block begins with a header that stores the Huffman coding tables needed to decode that block's data.

The standard LZH format allows for a 13-bit sliding window size, or 8,192 bytes. The window is typically initialized to all zero bytes at the start of each block. A preset dictionary is not used, only previously compressed data. Match references are limited to at most 256 bytes in length.

Nelson H.F. Beebe extended LZH to support arbitrarily large sliding window sizes, calling his format LZHXa. Window sizes are restricted to powers of 2, with 2^15 (32,768) and 2^16 (65,536) bytes being common. Increasing the window improves compression as more history data is searched, at the cost of slower encoding and more memory use.

LZH includes checksums to validate data integrity. Each block ends with a 16-bit CRC code. Multi-file archives store an additional CRC for each complete file. Most implementations use CRC-16 with the polynomial x^16 + x^15 + x^2 + 1, but some use CRC-16-CCITT.

Compressed LZH archives conventionally use the .lzh file extension. The format saw widest adoption in Japan, with popularity peaking in the early 1990s as it competed with ZIP and ARJ. An informal standard known as LHA evolved, used by the popular LArc and LHarc archivers. It added support for comments, timestamps, passwords, and multi-file archives.

LZH's key advantages were its higher compression ratios compared to ZIP, plus fast decompression. However, by the mid 1990s, ZIP gained popularity and displaced most other formats. ZIP's quick adoption as a standard on Windows, plus cross-platform library and tool support, led to it dominating the lossless compression market.

Today, LZH is rarely used outside of Japan and east Asia. New compression formats like bzip2, LZMA and others offer significantly better compression ratios. Some legacy applications may still encounter .lzh files, but modern ZIP, 7z or xz are recommended for archiving. Open-source tools like lhasa exist to extract old .lzh archives.

In summary, LZH innovatively combined Lempel-Ziv dictionary coding with Huffman bit reduction to achieve state-of-the-art compression when introduced. It saw brief adoption, especially in Japan, before being overtaken by the ZIP standard. But it played an important role in the history of data compression and the development of modern archive formats. LZH showcased techniques like sliding window dictionaries and symbol entropy coding that remain fundamental to how we compress data efficiently.

File compression reduces redundancy so the same information takes fewer bits. The upper bound on how far you can go is governed by information theory: for lossless compression, the limit is the entropy of the source (see Shannon’s source coding theorem and his original 1948 paper “A Mathematical Theory of Communication”). For lossy compression, the trade-off between rate and quality is captured by rate–distortion theory.

Two pillars: modeling and coding

Most compressors have two stages. First, a model predicts or exposes structure in the data. Second, a coder turns those predictions into near-optimal bit patterns. A classic modeling family is Lempel–Ziv: LZ77 (1977) and LZ78 (1978) detect repeated substrings and emit references instead of raw bytes. On the coding side, Huffman coding (see the original paper 1952) assigns shorter codes to more likely symbols. Arithmetic coding and range coding are finer-grained alternatives that squeeze closer to the entropy limit, while modern Asymmetric Numeral Systems (ANS) achieves similar compression with fast table-driven implementations.

What common formats actually do

DEFLATE (used by gzip, zlib, and ZIP) combines LZ77 with Huffman coding. Its specs are public: DEFLATE RFC 1951, zlib wrapper RFC 1950, and gzip file format RFC 1952. Gzip is framed for streaming and explicitly does not attempt to provide random access. PNG images standardize DEFLATE as their only compression method (with a max 32 KiB window), per the PNG spec “Compression method 0… deflate/inflate… at most 32768 bytes” and W3C/ISO PNG 2nd Edition.

Zstandard (zstd): a newer general-purpose compressor designed for high ratios with very fast decompression. The format is documented in RFC 8878 (also HTML mirror) and the reference spec on GitHub. Like gzip, the basic frame doesn’t aim for random access. One of zstd’s superpowers is dictionaries: small samples from your corpus that dramatically improve compression on many tiny or similar files (see python-zstandard dictionary docs and Nigel Tao’s worked example). Implementations accept both “unstructured” and “structured” dictionaries (discussion).

Brotli: optimized for web content (e.g., WOFF2 fonts, HTTP). It mixes a static dictionary with a DEFLATE-like LZ+entropy core. The spec is RFC 7932, which also notes a sliding window of 2WBITS−16 with WBITS in [10, 24] (1 KiB−16 B up to 16 MiB−16 B) and that it does not attempt random access. Brotli often beats gzip on web text while decoding quickly.

ZIP container: ZIP is a file archive that can store entries with various compression methods (deflate, store, zstd, etc.). The de facto standard is PKWARE’s APPNOTE (see APPNOTE portal, a hosted copy, and LC overviews ZIP File Format (PKWARE) / ZIP 6.3.3).

Speed vs. ratio: where formats land

LZ4 targets raw speed with modest ratios. See its project page (“extremely fast compression”) and frame format. It’s ideal for in-memory caches, telemetry, or hot paths where decompression must be near RAM speed.

XZ / LZMA push for density (great ratios) with relatively slow compression. XZ is a container; the heavy lifting is typically LZMA/LZMA2 (LZ77-like modeling + range coding). See .xz file format, the LZMA spec (Pavlov), and Linux kernel notes on XZ Embedded. XZ usually out-compresses gzip and often competes with high-ratio modern codecs, but with slower encode times.

bzip2 applies the Burrows–Wheeler Transform (BWT), move-to-front, RLE, and Huffman coding. It’s typically smaller than gzip but slower; see the official manual and man pages (Linux).

Windows, blocks, and random access

“Window size” matters. DEFLATE references can only look back 32 KiB (RFC 1951 and PNG’s 32 KiB cap noted here). Brotli’s window ranges from about 1 KiB to 16 MiB (RFC 7932). Zstd tunes window and search depth by level (RFC 8878). Basic gzip/zstd/brotli streams are designed for sequential decoding; the base formats don’t promise random access, though containers (e.g., tar indexes, chunked framing, or format-specific indexes) can layer it on.

Lossless vs. lossy

The formats above are lossless: you can reconstruct exact bytes. Media codecs are often lossy: they discard imperceptible detail to hit lower bitrates. In images, classic JPEG (DCT, quantization, entropy coding) is standardized in ITU-T T.81 / ISO/IEC 10918-1. In audio, MP3 (MPEG-1 Layer III) and AAC (MPEG-2/4) rely on perceptual models and MDCT transforms (see ISO/IEC 11172-3, ISO/IEC 13818-7, and an MDCT overview here). Lossy and lossless can coexist (e.g., PNG for UI assets; Web codecs for images/video/audio).

Practical tips

  • Pick for the job. Web text and fonts: brotli. General files and backups: zstd (great decompression speed and levels to trade time for ratio). Ultra-fast pipes and telemetry: lz4. Maximum density for long-term archives where encode time is OK: xz/LZMA.
  • Small files? Train and ship dictionaries with zstd (docs) / (example). They can shrink dozens of tiny, similar objects dramatically.
  • Interoperability. When exchanging multiple files, prefer a container (ZIP, tar) plus a compressor. ZIP’s APPNOTE defines method IDs and features; see PKWARE APPNOTE and LC overviews here.
  • Measure on your data. Ratios and speeds vary by corpus. Many repos publish benchmarks (e.g., LZ4’s README cites Silesia corpus here), but always validate locally.

Key references (deep dives)

Theory: Shannon 1948 · Rate–distortion · Coding: Huffman 1952 · Arithmetic coding · Range coding · ANS. Formats: DEFLATE · zlib · gzip · Zstandard · Brotli · LZ4 frame · XZ format. BWT stack: Burrows–Wheeler (1994) · bzip2 manual. Media: JPEG T.81 · MP3 ISO/IEC 11172-3 · AAC ISO/IEC 13818-7 · MDCT.

Bottom line: choose a compressor that matches your data and constraints, measure on real inputs, and don’t forget the gains from dictionaries and smart framing. With the right pairing, you can get smaller files, faster transfers, and snappier apps — without sacrificing correctness or portability.

Frequently Asked Questions

What is file compression?

File compression is a process that reduces the size of a file or files, typically to save storage space or speed up transmission over a network.

How does file compression work?

File compression works by identifying and removing redundancy in the data. It uses algorithms to encode the original data in a smaller space.

What are the different types of file compression?

The two primary types of file compression are lossless and lossy compression. Lossless compression allows the original file to be perfectly restored, while lossy compression enables more significant size reduction at the cost of some loss in data quality.

What is an example of a file compression tool?

A popular example of a file compression tool is WinZip, which supports multiple compression formats including ZIP and RAR.

Does file compression affect the quality of files?

With lossless compression, the quality remains unchanged. However, with lossy compression, there can be a noticeable decrease in quality since it eliminates less-important data to reduce file size more significantly.

Is file compression safe?

Yes, file compression is safe in terms of data integrity, especially with lossless compression. However, like any files, compressed files can be targeted by malware or viruses, so it's always important to have reputable security software in place.

What types of files can be compressed?

Almost all types of files can be compressed, including text files, images, audio, video, and software files. However, the level of compression achievable can significantly vary between file types.

What is meant by a ZIP file?

A ZIP file is a type of file format that uses lossless compression to reduce the size of one or more files. Multiple files in a ZIP file are effectively bundled together into a single file, which also makes sharing easier.

Can I compress an already compressed file?

Technically, yes, although the additional size reduction might be minimal or even counterproductive. Compressing an already compressed file might sometimes increase its size due to metadata added by the compression algorithm.

How can I decompress a file?

To decompress a file, you typically need a decompression or unzipping tool, like WinZip or 7-Zip. These tools can extract the original files from the compressed format.