aboutsummaryrefslogtreecommitdiff
path: root/doc/liblzma-intro.txt
diff options
context:
space:
mode:
Diffstat (limited to 'doc/liblzma-intro.txt')
-rw-r--r--doc/liblzma-intro.txt188
1 files changed, 188 insertions, 0 deletions
diff --git a/doc/liblzma-intro.txt b/doc/liblzma-intro.txt
new file mode 100644
index 00000000..9cbd63a9
--- /dev/null
+++ b/doc/liblzma-intro.txt
@@ -0,0 +1,188 @@
+
+Introduction to liblzma
+-----------------------
+
+Writing applications to work with liblzma
+
+ liblzma API is split in several subheaders to improve readability and
+ maintainance. The subheaders must not be #included directly; simply
+ use `#include <lzma.h>' instead.
+
+ Those who have used zlib should find liblzma's API easy to use.
+ To developers who haven't used zlib before, I recommend learning
+ zlib first, because zlib has excellent documentation.
+
+ While the API is similar to that of zlib, there are some major
+ differences, which are summarized below.
+
+ For basic stream encoding, zlib has three functions (deflateInit(),
+ deflate(), and deflateEnd()). Similarly, there are three functions
+ for stream decoding (inflateInit(), inflate(), and inflateEnd()).
+ liblzma has only single coding and ending function. Thus, to
+ encode one may use, for example, lzma_stream_encoder_single(),
+ lzma_code(), and lzma_end(). Simlarly for decoding, one may
+ use lzma_auto_decoder(), lzma_code(), and lzma_end().
+
+ zlib has deflateReset() and inflateReset() to reset the stream
+ structure without reallocating all the memory. In liblzma, all
+ coder initialization functions are like zlib's reset functions:
+ the first-time initializations are done with the same functions
+ as the reinitializations (resetting).
+
+ To make all this work, liblzma needs to know when lzma_stream
+ doesn't already point to an allocated and initialized coder.
+ This is achieved by initializing lzma_stream structure with
+ LZMA_STREAM_INIT (static initialization) or LZMA_STREAM_INIT_VAR
+ (for exampple when new lzma_stream has been allocated with malloc()).
+ This initialization should be done exactly once per lzma_stream
+ structure to avoid leaking memory. Calling lzma_end() will leave
+ lzma_stream into a state comparable to the state achieved with
+ LZMA_STREAM_INIT and LZMA_STREAM_INIT_VAR.
+
+ Example probably clarifies a lot. With zlib, compression goes
+ roughly like this:
+
+ z_stream strm;
+ deflateInit(&strm, level);
+ deflate(&strm, Z_RUN);
+ deflate(&strm, Z_RUN);
+ ...
+ deflate(&strm, Z_FINISH);
+ deflateEnd(&strm) or deflateReset(&strm)
+
+ With liblzma, it's slightly different:
+
+ lzma_stream strm = LZMA_STREAM_INIT;
+ lzma_stream_encoder_single(&strm, &options);
+ lzma_code(&strm, LZMA_RUN);
+ lzma_code(&strm, LZMA_RUN);
+ ...
+ lzma_code(&strm, LZMA_FINISH);
+ lzma_end(&strm) or reinitialize for new coding work
+
+ Reinitialization in the last step can be any function that can
+ initialize lzma_stream; it doesn't need to be the same function
+ that was used for the previous initialization. If it is the same
+ function, liblzma will usually be able to re-use most of the
+ existing memory allocations (depends on how much the initialization
+ options change). If you reinitialize with different function,
+ liblzma will automatically free the memory of the previous coder.
+
+
+File formats
+
+ liblzma supports multiple container formats for the compressed data.
+ Different initialization functions initialize the lzma_stream to
+ process different container formats. See the details from the public
+ header files.
+
+ The following functions are the most commonly used:
+
+ - lzma_stream_encoder_single(): Encodes Single-Block Stream; this
+ the recommended format for most purporses.
+
+ - lzma_alone_encoder(): Useful if you need to encode into the
+ legacy LZMA_Alone format.
+
+ - lzma_auto_decoder(): Decoder that automatically detects the
+ file format; recommended when you decode compressed files on
+ disk, because this way compatibility with the legacy LZMA_Alone
+ format is transparent.
+
+ - lzma_stream_decoder(): Decoder for Single- and Multi-Block
+ Streams; this is good if you want to accept only .lzma Streams.
+
+
+Filters
+
+ liblzma supports multiple filters (algorithm implementations). The new
+ .lzma format supports filter-chain having up to seven filters. In the
+ filter chain, the output of one filter is input of the next filter in
+ the chain. The legacy LZMA_Alone format supports only one filter, and
+ that must always be LZMA.
+
+ General-purporse compression:
+
+ LZMA The main algorithm of liblzma (surprise!)
+
+ Branch/Call/Jump filters for executables:
+
+ x86 This filter is known as BCJ in 7-Zip
+ IA64 IA-64 (Itanium)
+ PowerPC Big endian PowerPC
+ ARM
+ ARM-Thumb
+ SPARC
+
+ Other filters:
+
+ Copy Dummy filter that simply copies all the data
+ from input to output.
+
+ Subblock Multi-purporse filter, that can
+ - embed End of Payload Marker if the previous
+ filter in the chain doesn't support it; and
+ - apply Subfilters, which filter only part
+ of the same compressed Block in the Stream.
+
+ Branch/Call/Jump filters never change the size of the data. They
+ should usually be used as a pre-filter for some compression filter
+ like LZMA.
+
+
+Integrity checks
+
+ The .lzma Stream format uses CRC32 as the integrity check for
+ different file format headers. It is possible to omit CRC32 from
+ the Block Headers, but not from Stream Header. This is the reason
+ why CRC32 code cannot be disabled when building liblzma (in addition,
+ the LZMA encoder uses CRC32 for hashing, so that's another reason).
+
+ The integrity check of the actual data is calculated from the
+ uncompressed data. This check can be CRC32, CRC64, or SHA256.
+ It can also be omitted completely, although that usually is not
+ a good thing to do. There are free IDs left, so support for new
+ checks algorithms can be added later.
+
+
+API and ABI stability
+
+ The API and ABI of liblzma isn't stable yet, although no huge
+ changes should happen. One potential place for change is the
+ lzma_options_subblock structure.
+
+ In the 4.42.0alpha phase, the shared library version number won't
+ be updated even if ABI breaks. I don't want to track the ABI changes
+ yet. Just rebuild everything when you upgrade liblzma until we get
+ to the beta stage.
+
+
+Size of the library
+
+ While liblzma isn't huge, it is quite far from the smallest possible
+ LZMA implementation: full liblzma binary (with support for all
+ filters and other features) is way over 100 KiB, but the plain raw
+ LZMA decoder is only 5-10 KiB.
+
+ To decrease the size of the library, you can omit parts of the library
+ by passing certain options to the `configure' script. Disabling
+ everything but the decoders of the require filters will usually give
+ you a small enough library, but if you need a decoder for example
+ embedded in the operating system kernel, the code from liblzma probably
+ isn't suitable as is.
+
+ If you need a minimal implementation supporting .lzma Streams, you
+ may need to do partial rewrite. liblzma uses stateful API like zlib.
+ That increases the size of the library. Using callback API or even
+ simpler buffer-to-buffer API would allow smaller implementation.
+
+ LZMA SDK contains smaller LZMA decoder written in ANSI-C than
+ liblzma, so you may want to take a look at that code. However,
+ it doesn't (at least not yet) support the new .lzma Stream format.
+
+
+Documentation
+
+ There's no other documentation than the public headers and this
+ text yet. Real docs will be written some day, I hope.
+