aboutsummaryrefslogtreecommitdiff
path: root/doc/liblzma-intro.txt
diff options
context:
space:
mode:
authorLasse Collin <lasse.collin@tukaani.org>2009-05-01 11:28:52 +0300
committerLasse Collin <lasse.collin@tukaani.org>2009-05-01 11:28:52 +0300
commitbe06858d5cf8ba46557395035d821dc332f3f830 (patch)
tree603491cf2b789dd19afd7f3cc6185873f1a36cb8 /doc/liblzma-intro.txt
parentAdded documentation about the legacy .lzma file format. (diff)
downloadxz-be06858d5cf8ba46557395035d821dc332f3f830.tar.xz
Remove docs that are too outdated to be updated
(rewrite will be better).
Diffstat (limited to 'doc/liblzma-intro.txt')
-rw-r--r--doc/liblzma-intro.txt194
1 files changed, 0 insertions, 194 deletions
diff --git a/doc/liblzma-intro.txt b/doc/liblzma-intro.txt
deleted file mode 100644
index 52c4d920..00000000
--- a/doc/liblzma-intro.txt
+++ /dev/null
@@ -1,194 +0,0 @@
-
-Introduction to liblzma
------------------------
-
-Writing applications to work with liblzma
-
- liblzma API is split in several subheaders to improve readability and
- maintainance. The subheaders must not be #included directly. lzma.h
- requires that certain integer types and macros are available when
- the header is #included. On systems that have inttypes.h that conforms
- to C99, the following will work:
-
- #include <sys/types.h>
- #include <inttypes.h>
- #include <lzma.h>
-
- Those who have used zlib should find liblzma's API easy to use.
- To developers who haven't used zlib before, I recommend learning
- zlib first, because zlib has excellent documentation.
-
- While the API is similar to that of zlib, there are some major
- differences, which are summarized below.
-
- For basic stream encoding, zlib has three functions (deflateInit(),
- deflate(), and deflateEnd()). Similarly, there are three functions
- for stream decoding (inflateInit(), inflate(), and inflateEnd()).
- liblzma has only single coding and ending function. Thus, to
- encode one may use, for example, lzma_stream_encoder_single(),
- lzma_code(), and lzma_end(). Simlarly for decoding, one may
- use lzma_auto_decoder(), lzma_code(), and lzma_end().
-
- zlib has deflateReset() and inflateReset() to reset the stream
- structure without reallocating all the memory. In liblzma, all
- coder initialization functions are like zlib's reset functions:
- the first-time initializations are done with the same functions
- as the reinitializations (resetting).
-
- To make all this work, liblzma needs to know when lzma_stream
- doesn't already point to an allocated and initialized coder.
- This is achieved by initializing lzma_stream structure with
- LZMA_STREAM_INIT (static initialization) or LZMA_STREAM_INIT_VAR
- (for exampple when new lzma_stream has been allocated with malloc()).
- This initialization should be done exactly once per lzma_stream
- structure to avoid leaking memory. Calling lzma_end() will leave
- lzma_stream into a state comparable to the state achieved with
- LZMA_STREAM_INIT and LZMA_STREAM_INIT_VAR.
-
- Example probably clarifies a lot. With zlib, compression goes
- roughly like this:
-
- z_stream strm;
- deflateInit(&strm, level);
- deflate(&strm, Z_RUN);
- deflate(&strm, Z_RUN);
- ...
- deflate(&strm, Z_FINISH);
- deflateEnd(&strm) or deflateReset(&strm)
-
- With liblzma, it's slightly different:
-
- lzma_stream strm = LZMA_STREAM_INIT;
- lzma_stream_encoder_single(&strm, &options);
- lzma_code(&strm, LZMA_RUN);
- lzma_code(&strm, LZMA_RUN);
- ...
- lzma_code(&strm, LZMA_FINISH);
- lzma_end(&strm) or reinitialize for new coding work
-
- Reinitialization in the last step can be any function that can
- initialize lzma_stream; it doesn't need to be the same function
- that was used for the previous initialization. If it is the same
- function, liblzma will usually be able to re-use most of the
- existing memory allocations (depends on how much the initialization
- options change). If you reinitialize with different function,
- liblzma will automatically free the memory of the previous coder.
-
-
-File formats
-
- liblzma supports multiple container formats for the compressed data.
- Different initialization functions initialize the lzma_stream to
- process different container formats. See the details from the public
- header files.
-
- The following functions are the most commonly used:
-
- - lzma_stream_encoder_single(): Encodes Single-Block Stream; this
- the recommended format for most purporses.
-
- - lzma_alone_encoder(): Useful if you need to encode into the
- legacy LZMA_Alone format.
-
- - lzma_auto_decoder(): Decoder that automatically detects the
- file format; recommended when you decode compressed files on
- disk, because this way compatibility with the legacy LZMA_Alone
- format is transparent.
-
- - lzma_stream_decoder(): Decoder for Single- and Multi-Block
- Streams; this is good if you want to accept only .lzma Streams.
-
-
-Filters
-
- liblzma supports multiple filters (algorithm implementations). The new
- .lzma format supports filter-chain having up to seven filters. In the
- filter chain, the output of one filter is input of the next filter in
- the chain. The legacy LZMA_Alone format supports only one filter, and
- that must always be LZMA.
-
- General-purporse compression:
-
- LZMA The main algorithm of liblzma (surprise!)
-
- Branch/Call/Jump filters for executables:
-
- x86 This filter is known as BCJ in 7-Zip
- IA64 IA-64 (Itanium)
- PowerPC Big endian PowerPC
- ARM
- ARM-Thumb
- SPARC
-
- Other filters:
-
- Copy Dummy filter that simply copies all the data
- from input to output.
-
- Subblock Multi-purporse filter, that can
- - embed End of Payload Marker if the previous
- filter in the chain doesn't support it; and
- - apply Subfilters, which filter only part
- of the same compressed Block in the Stream.
-
- Branch/Call/Jump filters never change the size of the data. They
- should usually be used as a pre-filter for some compression filter
- like LZMA.
-
-
-Integrity checks
-
- The .lzma Stream format uses CRC32 as the integrity check for
- different file format headers. It is possible to omit CRC32 from
- the Block Headers, but not from Stream Header. This is the reason
- why CRC32 code cannot be disabled when building liblzma (in addition,
- the LZMA encoder uses CRC32 for hashing, so that's another reason).
-
- The integrity check of the actual data is calculated from the
- uncompressed data. This check can be CRC32, CRC64, or SHA256.
- It can also be omitted completely, although that usually is not
- a good thing to do. There are free IDs left, so support for new
- checks algorithms can be added later.
-
-
-API and ABI stability
-
- The API and ABI of liblzma isn't stable yet, although no huge
- changes should happen. One potential place for change is the
- lzma_options_subblock structure.
-
- In the 4.42.0alpha phase, the shared library version number won't
- be updated even if ABI breaks. I don't want to track the ABI changes
- yet. Just rebuild everything when you upgrade liblzma until we get
- to the beta stage.
-
-
-Size of the library
-
- While liblzma isn't huge, it is quite far from the smallest possible
- LZMA implementation: full liblzma binary (with support for all
- filters and other features) is way over 100 KiB, but the plain raw
- LZMA decoder is only 5-10 KiB.
-
- To decrease the size of the library, you can omit parts of the library
- by passing certain options to the `configure' script. Disabling
- everything but the decoders of the require filters will usually give
- you a small enough library, but if you need a decoder for example
- embedded in the operating system kernel, the code from liblzma probably
- isn't suitable as is.
-
- If you need a minimal implementation supporting .lzma Streams, you
- may need to do partial rewrite. liblzma uses stateful API like zlib.
- That increases the size of the library. Using callback API or even
- simpler buffer-to-buffer API would allow smaller implementation.
-
- LZMA SDK contains smaller LZMA decoder written in ANSI-C than
- liblzma, so you may want to take a look at that code. However,
- it doesn't (at least not yet) support the new .lzma Stream format.
-
-
-Documentation
-
- There's no other documentation than the public headers and this
- text yet. Real docs will be written some day, I hope.
-