LZH is a lossless data compression archive format named after Lempel–Ziv–Huffman, the algorithms on which it is based. It was first released in 1987 by Haruyasu Yoshizaki as an improvement over the earlier LZ77 and LZ78 compression formats developed by Abraham Lempel and Jacob Ziv in the late 1970s. LZH provided more efficient compression ratios while still allowing fast decompression.
The core compression algorithms used in LZH are dictionary-based, leveraging previously seen data to more compactly encode future data. The encoder maintains a sliding window buffer of the most recently processed data. When new data is encountered, the encoder searches for the longest matching sequence in the sliding window. If a match is found, the data is encoded as a reference to the matching window position and length, rather than the literal data. This reference typically consumes less space than the original content.
LZ77, used as a basis for LZH, has an encoding loop that looks like this: 1. Search the sliding window for the longest match with upcoming input 2. Output (offset , length) pair that refers to the match 3. If no match found, output a literal byte 4. Move the window forward by the match length
In 1986, Terry Welch published an improved LZW algorithm that adapted work from LZ78. It used index numbers into a dictionary of strings to achieve more density than LZ77. The next year, LZH was released, incorporating the LZ77 sliding dictionary approach but adding Huffman coding on the LZ77 symbols as an extra step to improve compression ratios.
Huffman coding assigns short bit sequences to frequently used symbols and longer sequences to rare ones. In LZH, the possible symbols are literal bytes, end-of-block markers, and match references into the sliding window dictionary. The Huffman coding model is computed uniquely for each block of data based on that block's symbol frequency distribution. More frequent match references are assigned shorter bit codes. This entropy encoding step is applied after matching against the sliding window.
Decompression parses the variable-length Huffman codes from the input stream, translating them back to literal bytes and match references. References are resolved by looking back into the window buffer at the decoded data, copying the match to the output. The window is slid forward after each symbol. Decompression is fast, as no searching for matches is required.
LZH files contain a series of blocks, each independently compressed with this LZ77+Huffman model. Separate blocks allow random access and error recovery. Each block begins with a header that stores the Huffman coding tables needed to decode that block's data.
The standard LZH format allows for a 13-bit sliding window size, or 8,192 bytes. The window is typically initialized to all zero bytes at the start of each block. A preset dictionary is not used, only previously compressed data. Match references are limited to at most 256 bytes in length.
Nelson H.F. Beebe extended LZH to support arbitrarily large sliding window sizes, calling his format LZHXa. Window sizes are restricted to powers of 2, with 2^15 (32,768) and 2^16 (65,536) bytes being common. Increasing the window improves compression as more history data is searched, at the cost of slower encoding and more memory use.
LZH includes checksums to validate data integrity. Each block ends with a 16-bit CRC code. Multi-file archives store an additional CRC for each complete file. Most implementations use CRC-16 with the polynomial x^16 + x^15 + x^2 + 1, but some use CRC-16-CCITT.
Compressed LZH archives conventionally use the .lzh file extension. The format saw widest adoption in Japan, with popularity peaking in the early 1990s as it competed with ZIP and ARJ. An informal standard known as LHA evolved, used by the popular LArc and LHarc archivers. It added support for comments, timestamps, passwords, and multi-file archives.
LZH's key advantages were its higher compression ratios compared to ZIP, plus fast decompression. However, by the mid 1990s, ZIP gained popularity and displaced most other formats. ZIP's quick adoption as a standard on Windows, plus cross-platform library and tool support, led to it dominating the lossless compression market.
Today, LZH is rarely used outside of Japan and east Asia. New compression formats like bzip2, LZMA and others offer significantly better compression ratios. Some legacy applications may still encounter .lzh files, but modern ZIP, 7z or xz are recommended for archiving. Open-source tools like lhasa exist to extract old .lzh archives.
In summary, LZH innovatively combined Lempel-Ziv dictionary coding with Huffman bit reduction to achieve state-of-the-art compression when introduced. It saw brief adoption, especially in Japan, before being overtaken by the ZIP standard. But it played an important role in the history of data compression and the development of modern archive formats. LZH showcased techniques like sliding window dictionaries and symbol entropy coding that remain fundamental to how we compress data efficiently.
File compression is a process that reduces the size of data files for efficient storage or transmission. It uses various algorithms to condense data by identifying and eliminating redundancy, which can often substantially decrease the size of the data without losing the original information.
There are two main types of file compression: lossless and lossy. Lossless compression allows the original data to be perfectly reconstructed from the compressed data, which is ideal for files where every bit of data is important, like text or database files. Common examples include ZIP and RAR file formats. On the other hand, lossy compression eliminates less important data to reduce file size more significantly, often used in audio, video, and image files. JPEGs and MP3s are examples where some data loss does not substantially degrade the perceptual quality of the content.
File compression is beneficial in a multitude of ways. It conserves storage space on devices and servers, lowering costs and improving efficiency. It also speeds up file transfer times over networks, including the internet, which is especially valuable for large files. Moreover, compressed files can be grouped together into one archive file, assisting in organization and easier transportation of multiple files.
However, file compression does have some drawbacks. The compression and decompression process requires computational resources, which could slow down system performance, particularly for larger files. Also, in the case of lossy compression, some original data is lost during compression, and the resultant quality may not be acceptable for all uses, especially professional applications that demand high quality.
File compression is a critical tool in today's digital world. It enhances efficiency, saves storage space and decreases download and upload times. Nonetheless, it comes with its own set of drawbacks in terms of system performance and risk of quality degradation. Therefore, it is essential to be mindful of these factors to choose the right compression technique for specific data needs.
File compression is a process that reduces the size of a file or files, typically to save storage space or speed up transmission over a network.
File compression works by identifying and removing redundancy in the data. It uses algorithms to encode the original data in a smaller space.
The two primary types of file compression are lossless and lossy compression. Lossless compression allows the original file to be perfectly restored, while lossy compression enables more significant size reduction at the cost of some loss in data quality.
A popular example of a file compression tool is WinZip, which supports multiple compression formats including ZIP and RAR.
With lossless compression, the quality remains unchanged. However, with lossy compression, there can be a noticeable decrease in quality since it eliminates less-important data to reduce file size more significantly.
Yes, file compression is safe in terms of data integrity, especially with lossless compression. However, like any files, compressed files can be targeted by malware or viruses, so it's always important to have reputable security software in place.
Almost all types of files can be compressed, including text files, images, audio, video, and software files. However, the level of compression achievable can significantly vary between file types.
A ZIP file is a type of file format that uses lossless compression to reduce the size of one or more files. Multiple files in a ZIP file are effectively bundled together into a single file, which also makes sharing easier.
Technically, yes, although the additional size reduction might be minimal or even counterproductive. Compressing an already compressed file might sometimes increase its size due to metadata added by the compression algorithm.
To decompress a file, you typically need a decompression or unzipping tool, like WinZip or 7-Zip. These tools can extract the original files from the compressed format.