diff options
Diffstat (limited to 'doc')
-rw-r--r-- | doc/lzma-file-format.txt | 166 |
1 files changed, 166 insertions, 0 deletions
diff --git a/doc/lzma-file-format.txt b/doc/lzma-file-format.txt new file mode 100644 index 00000000..21fcb19f --- /dev/null +++ b/doc/lzma-file-format.txt @@ -0,0 +1,166 @@ + +The .lzma File Format +===================== + + 0. Preface + 0.1. Notices and Acknowledgements + 0.2. Changes + 1. File Format + 1.1. Header + 1.1.1. Properties + 1.1.2. Dictionary Size + 1.1.3. Uncompressed Size + 1.2. LZMA Compressed Data + 2. References + + +0. Preface + + This document describes the .lzma file format, which is + sometimes also called LZMA_Alone format. It is a legacy file + format, which is being or has been replaced by the .xz format. + The MIME type of the .lzma format is `application/x-lzma'. + + The most commonly used software to handle .lzma files are + LZMA SDK, LZMA Utils, 7-Zip, and XZ Utils. This document + describes some of the differences between these implementations + and gives hints what subset of the .lzma format is the most + portable. + + +0.1. Notices and Acknowledgements + + This file format was designed by Igor Pavlov for use in + LZMA SDK. This document was written by Lasse Collin + <lasse.collin@tukaani.org> using the documentation found + from the LZMA SDK. + + This document has been put into the public domain. + + +0.2. Changes + + Last modified: 2009-05-01 11:15+0300 + + +1. File Format + + +-+-+-+-+-+-+-+-+-+-+-+-+-+==========================+ + | Header | LZMA Compressed Data | + +-+-+-+-+-+-+-+-+-+-+-+-+-+==========================+ + + The .lzma format file consist of 13-byte Header followed by + the LZMA Compressed Data. + + Unlike the .gz, .bz2, and .xz formats, it is not possible to + concatenate multiple .lzma files as is and expect the + decompression tool to decode the resulting file as if it were + a single .lzma file. + + For example, the command line tools from LZMA Utils and + LZMA SDK silently ignore all the data after the first .lzma + stream. In contrast, the command line tool from XZ Utils + considers the .lzma file to be corrupt if there is data after + the first .lzma stream. + + +1.1. Header + + +------------+----+----+----+----+--+--+--+--+--+--+--+--+ + | Properties | Dictionary Size | Uncompressed Size | + +------------+----+----+----+----+--+--+--+--+--+--+--+--+ + + +1.1.1. Properties + + The Properties field contains three properties. An abbreviation + is given in parentheses, followed by the value range of the + property. The field consists of + + 1) the number of literal context bits (lc, [0, 8]); + 2) the number of literal position bits (lp, [0, 4]); and + 3) the number of position bits (pb, [0, 4]). + + The properties are encoded using the following formula: + + Properties = (pb * 5 + lp) * 9 + lc + + The following C code illustrates a straightforward way to + decode the Properties field: + + uint8_t lc, lp, pb; + uint8_t prop = get_lzma_properties(); + if (prop > (4 * 5 + 4) * 9 + 8) + return LZMA_PROPERTIES_ERROR; + + pb = prop / (9 * 5); + prop -= pb * 9 * 5; + lp = prop / 9; + lc = prop - lp * 9; + + XZ Utils has an additional requirement: lc + lp <= 4. Files + which don't follow this requirement cannot be decompressed + with XZ Utils. Usually this isn't a problem since the most + common lc/lp/pb values are 3/0/2. It is the only lc/lp/pb + combination that the files created by LZMA Utils can have, + but LZMA Utils can decompress files with any lc/lp/pb. + + +1.1.2. Dictionary Size + + Dictionary Size is stored as an unsigned 32-bit little endian + integer. Any 32-bit value is possible, but for maximum + portability, only sizes of 2^n and 2^n + 2^(n-1) should be + used. + + LZMA Utils creates only files with dictionary size 2^n, + 16 <= n <= 25. LZMA Utils can decompress files with any + dictionary size. + + XZ Utils creates and decompresses .lzma files only with + dictionary sizes 2^n and 2^n + 2^(n-1). If some other + dictionary size is specified when compressing, the value + stored in the Dictionary Size field is a rounded up, but the + specified value is still used in the actual compression code. + + +1.1.3. Uncompressed Size + + Uncompressed Size is stored as unsigned 64-bit little endian + integer. A special value of 0xFFFF_FFFF_FFFF_FFFF indicates + that Uncompressed Size is unknown. End of Payload Marker (*) + is used if and only if Uncompressed Size is unknown. + + XZ Utils rejects files whose Uncompressed Size field specifies + a known size that is 256 GiB or more. This is to reject false + positives when trying to guess if the input file is in the + .lzma format. When Uncompressed Size is unknown, there is no + limit for the uncompressed size of the file. + + (*) Some tools use the term End of Stream (EOS) marker + instead of End of Payload Marker. + + +1.2. LZMA Compressed Data + + Detailed description of the format of this field is out of + scope of this document. + + +2. References + + LZMA SDK - The original LZMA implementation + http://7-zip.org/sdk.html + + 7-Zip + http://7-zip.org/ + + LZMA Utils - LZMA adapted to POSIX-like systems + http://tukaani.org/lzma/ + + XZ Utils - The next generation of LZMA Utils + http://tukaani.org/xz/ + + The .xz file format - The successor of the the .lzma format + http://tukaani.org/xz/xz-file-format.txt + |