diff options
author | Lasse Collin <lasse.collin@tukaani.org> | 2007-12-09 00:42:33 +0200 |
---|---|---|
committer | Lasse Collin <lasse.collin@tukaani.org> | 2007-12-09 00:42:33 +0200 |
commit | 5d018dc03549c1ee4958364712fb0c94e1bf2741 (patch) | |
tree | 1b211911fb33fddb3f04b77f99e81df23623ffc4 /doc/faq.txt | |
download | xz-5d018dc03549c1ee4958364712fb0c94e1bf2741.tar.xz |
Imported to git.
Diffstat (limited to 'doc/faq.txt')
-rw-r--r-- | doc/faq.txt | 247 |
1 files changed, 247 insertions, 0 deletions
diff --git a/doc/faq.txt b/doc/faq.txt new file mode 100644 index 00000000..d01cf91b --- /dev/null +++ b/doc/faq.txt @@ -0,0 +1,247 @@ + +LZMA Utils FAQ +-------------- + + Copyright (C) 2007 Lasse Collin + + Copying and distribution of this file, with or without modification, + are permitted in any medium without royalty provided the copyright + notice and this notice are preserved. + + +Q: What are LZMA, LZMA Utils, lzma, .lzma, liblzma, LZMA SDK, LZMA_Alone, + 7-Zip and p7zip? + +A: LZMA stands for Lempel-Ziv-Markov chain-Algorithm. LZMA is the name + of the compression algorithm designed by Igor Pavlov. He is the author + of 7-Zip, which is a great LGPL'd compression tool for Microsoft + Windows operating systems. In addition to 7-Zip itself, also LZMA SDK + is available on the website of 7-Zip. LZMA SDK contains LZMA + implementations in C++, Java and C#. The C++ version is the original + implementation which is used also in 7-Zip itself. + + Excluding the unrar plugin, 7-Zip is free software (free as in + freedom). Thanks to this, it was possible to port it to POSIX + platforms. The port was done and is maintained by myspace (TODO: + myspace's real name?). p7zip is a port of 7-Zip's command line version; + p7zip doesn't include the 7-Zip's GUI. + + In POSIX world, users are used to gzip and bzip2 command line tools. + Developers know APIs of zlib and libbzip2. LZMA Utils try to ease + adoption of LZMA on free operating systems by providing a compression + library and a set of command line tools. The library is called liblzma. + It provides a zlib-like API making it easy to adapt LZMA compression in + existing applications. The main command line tool is known as lzma, + whose command line syntax is very similar to that of gzip and bzip2. + + The original command line tool from LZMA SDK (lzma.exe) was found from + a directory called LZMA_Alone in the LZMA SDK. It used a simple header + format in .lzma files. This format was also used by LZMA Utils up to + and including 4.32.x. In LZMA Utils documentation, LZMA_Alone refers + to both the file format and the command line tool from LZMA SDK. + + Because of various limitations of the LZMA_Alone file format, a new + file format was developed. Extending some existing format such as .gz + used by gzip was considered, but these formats were found to be too + limited. The filename suffix for the new .lzma format is `.lzma'. The + same suffix is also used for files in the LZMA_Alone format. To make + the transition to the new format as transparent as possible, LZMA Utils + support both the new and old formats transparently. + + 7-Zip and LZMA SDK: <http://7-zip.org/> + p7zip: <http://p7zip.sourceforge.net/> + LZMA Utils: <http://tukaani.org/lzma/> + + +Q: What LZMA implementations there are available? + +A: LZMA SDK contains implementations in C++, Java and C#. The C++ version + is the original implementation which is part of 7-Zip. LZMA SDK + contains also a small LZMA decoder in C. + + A port of LZMA SDK to Pascal was made by Alan Birtles + <http://www.birtles.org.uk/programming/>. It should work with + multiple Pascal programming language implementations. + + LZMA Utils includes liblzma, which is directly based on LZMA SDK. + liblzma is written in C (C99, not C89). In contrast to C++ callback + API used by LZMA SDK, liblzma uses zlib-like stateful C API. I do not + want to comment whether both/former/latter/neither API(s) are good or + bad. The only reason to implement a zlib-like API was, that many + developers are already familiar with zlib, and very many applications + already use zlib. Having a similar API makes it easier to include LZMA + support in existing applications. + + See also <http://en.wikipedia.org/wiki/LZMA#External_links>. + + +Q: Which file formats are supported by LZMA Utils? + +A: Even when the raw LZMA stream is always the same, it can be wrapped + in different container formats. The preferred format is the new .lzma + format. It has magic bytes (the first six bytes: 0xFF 'L' 'Z' 'M' + 'A' 0x00). The format supports chaining up to seven filters filters, + splitting data to multiple blocks for easier multi-threading and rough + random-access reading. The file integrity is verified using CRC32, + CRC64, or SHA256, and by verifying the uncompressed size of the file. + + LZMA SDK includes a tool called LZMA_Alone. It supports uses a + primitive header which includes only the mandatory stream information + required by the LZMA decoder. This format can be both read and + written by liblzma and the command line tool (use --format=alone to + create such files). + + .7z is the native archive format used by 7-Zip. This format is not + supported by liblzma, and probably will never be supported. You + should use e.g. p7zip to extract .7z files. + + It is possible to implement custom file formats by using raw filter + mode in liblzma. In this mode the application needs to store the filter + properties and provide them to liblzma before starting to uncompress + the data. + + +Q: How can I identify files containing LZMA compressed data? + +A: The preferred filename suffix for .lzma files is `.lzma'. `.tar.lzma' + may be abbreviated to `.tlz'. The same suffixes are used for files in + LZMA_Alone format. In practice this should be no problem since tools + included in LZMA Utils support both formats transparently. + + Checking the magic bytes is easy way to detect files in the new .lzma + format (the first six bytes: 0xFF 'L' 'Z' 'M' 'A' 0x00). The "file" + command version FIXME contains magic strings for this format. + + The old LZMA_Alone format has no magic bytes. Its header cannot contain + arbitrary bytes, thus it is possible to make a guess. Unfortunately the + guessing is usually too hard to be reliable, so don't try it unless you + are desperate. + + +Q: Does the lzma command line tool support sparse files? + +A: Sparse files can (of course) be compressed like normal files, but + uncompression will not restore sparseness of the file. Use an archiver + tool to take care of sparseness before compressing the data with lzma. + + The reason for this is that archiver tools handle files, while + compression tools handle streams or buffers. Being a sparse file is + a property of the file on the disk, not a property of the stream or + buffer. + + +Q: Can I recover parts of a broken LZMA file (e.g. corrupted CD-R)? + +A: With LZMA_Alone and single-block .lzma files, you can uncompress the + file until you hit the first broken byte. The data after the broken + position is lost. LZMA relies on the uncompression history, and if + bytes are missing in the middle of the file, it is impossible to + reliably continue after the broken section. + + With multi-block .lzma files it may be possible to locale the next + block in the file and continue decoding there. A limited recovery + tool for this kind of situations is planned. + + +Q: Is LZMA patented? + +A: No, the authors are not aware of any patents that could affect LZMA. + However, due to nature of software patents, the authors cannot + guarantee, that LZMA isn't affected by any third party patent. + + +Q: Where can I find documentation about how LZMA works as an algorithm? + +A: Read the source code, Luke. There is no documentation about LZMA + internals. It is possible that Igor Pavlov is the only person on + the Earth that completely knows and understands the algorithm. + + You could begin by downloading LZMA SDK, and start reading from + the LZMA decoder to get some idea about the bitstream format. + Before you begin, you should know the basics of LZ77 and + range coding algorithms. LZMA is based on LZ77, but LZMA is + *a lot* more complex. Range coding is used to compress the + final bitstream like Huffman coding is used in Deflate. + + +Q: What are filters? + +A: In context of .lzma files, a filter means an implementation of a + compression algorithm. The primary filter is LZMA, which is why + the names of the tools contain the letters LZMA. + + liblzma and the new .lzma format support also other filters than LZMA. + There are different types of filters, which are suitable for different + types of data. Thus, to select the optimal filter and settings, the + type of the input data being compressed needs to be known. + + Some filters are most useful when combined with another filter like + LZMA. These filters increase redundancy in the data, without changing + the size of the data, by taking advantage of properties specific to + the data being compressed. + + So far, all the filters are always reversible. That is, no matter what + data you pass to a filter encoder, it can be always defiltered back to + the original form. Because of this, it is safe to compress for example + a software package that contains other file types than executables + using a filter specific to the architechture of the package being + compressed. + + The old LZMA_Alone format supports only the LZMA filter. + + +Q: I cannot find BCJ and BCJ2 filters. Don't they exist in liblzma? + +A: BCJ filter is called "x86" in liblzma. BCJ2 is not included, + because it requires using more than one encoded output stream. + + +Q: Can I use LZMA in proprietary, non-free applications? + +A: liblzma is under the GNU LGPL version 2.1 or (at your opinion) any + later version. To summarise (*NOTE* This summary is not legally + binding, that is, it doesn't give you any extra permissions compared + to the LGPL. Read the GNU LGPL carefully for the exact license + conditions.): + * All the changes made into the library itself must be published + under the same license. + * End users must be able to replace the used liblzma. Easiest way + to assure this is to link dynamically against liblzma so users + can replace the shared library file if they want. + * You must make it clear to your users, that your application uses + liblzma, and that liblzma is free software under the GNU LGPL. + A copy of GNU LGPL must be included. + + LZMA SDK contains a special exception which allows linking *unmodified* + code statically with a non-free application. This exception does *not* + apply to liblzma. + + As an alternative, you can support the development of LZMA and 7-Zip + by buying a proprietary license from Igor Pavlov. See homepage of + LZMA SDK <http://7-zip.org/sdk.html> for more information. Note that + having a proprietary license from Igor Pavlov doesn't allow you to use + liblzma in a way that contradicts with the GNU LGPL, because liblzma + contains code that is not copyrighted by Igor Pavlov. Please contact + both Lasse Collin and Igor Pavlov if the license conditions of liblzma + are not suitable for you. + + +Q: I would like to help. What can I do? + +A: See the TODO file. Please contact Lasse Collin before starting to do + anything, because it is possible that someone else is already working + on the same thing. + + +Q: How can I contact the authors? + +A: Lasse Collin is the maintainer of LZMA Utils. You can contact him + either via IRC (Larhzu on #tukaani at Freenode or IRCnet). Email + should work too, <lasse.collin@tukaani.org>. + + Igor Pavlov is the father of LZMA. He is the author of 7-Zip + and LZMA SDK. <http://7-zip.org/> + + NOTE: Please don't bother Igor Pavlov with questions specific + to LZMA Utils. + |