Introduction to liblzma
-----------------------

Writing applications to work with liblzma

    liblzma API is split in several subheaders to improve readability and
    maintainance. The subheaders must not be #included directly. lzma.h
    requires that certain integer types and macros are available when
    the header is #included. On systems that have inttypes.h that conforms
    to C99, the following will work:

        #include <sys/types.h>
        #include <inttypes.h>
        #include <lzma.h>

    Those who have used zlib should find liblzma's API easy to use.
    To developers who haven't used zlib before, I recommend learning
    zlib first, because zlib has excellent documentation.

    While the API is similar to that of zlib, there are some major
    differences, which are summarized below.

    For basic stream encoding, zlib has three functions (deflateInit(),
    deflate(), and deflateEnd()). Similarly, there are three functions
    for stream decoding (inflateInit(), inflate(), and inflateEnd()).
    liblzma has only single coding and ending function. Thus, to
    encode one may use, for example, lzma_stream_encoder_single(),
    lzma_code(), and lzma_end(). Simlarly for decoding, one may
    use lzma_auto_decoder(), lzma_code(), and lzma_end().

    zlib has deflateReset() and inflateReset() to reset the stream
    structure without reallocating all the memory. In liblzma, all
    coder initialization functions are like zlib's reset functions:
    the first-time initializations are done with the same functions
    as the reinitializations (resetting).

    To make all this work, liblzma needs to know when lzma_stream
    doesn't already point to an allocated and initialized coder.
    This is achieved by initializing lzma_stream structure with
    LZMA_STREAM_INIT (static initialization) or LZMA_STREAM_INIT_VAR
    (for exampple when new lzma_stream has been allocated with malloc()).
    This initialization should be done exactly once per lzma_stream
    structure to avoid leaking memory. Calling lzma_end() will leave
    lzma_stream into a state comparable to the state achieved with
    LZMA_STREAM_INIT and LZMA_STREAM_INIT_VAR.

    Example probably clarifies a lot. With zlib, compression goes
    roughly like this:

        z_stream strm;
        deflateInit(&strm, level);
        deflate(&strm, Z_RUN);
        deflate(&strm, Z_RUN);
        ...
        deflate(&strm, Z_FINISH);
        deflateEnd(&strm) or deflateReset(&strm)

    With liblzma, it's slightly different:

        lzma_stream strm = LZMA_STREAM_INIT;
        lzma_stream_encoder_single(&strm, &options);
        lzma_code(&strm, LZMA_RUN);
        lzma_code(&strm, LZMA_RUN);
        ...
        lzma_code(&strm, LZMA_FINISH);
        lzma_end(&strm) or reinitialize for new coding work

     Reinitialization in the last step can be any function that can
     initialize lzma_stream; it doesn't need to be the same function
     that was used for the previous initialization. If it is the same
     function, liblzma will usually be able to re-use most of the
     existing memory allocations (depends on how much the initialization
     options change). If you reinitialize with different function,
     liblzma will automatically free the memory of the previous coder.


File formats

    liblzma supports multiple container formats for the compressed data.
    Different initialization functions initialize the lzma_stream to
    process different container formats. See the details from the public
    header files.

    The following functions are the most commonly used:

      - lzma_stream_encoder_single(): Encodes Single-Block Stream; this
        the recommended format for most purporses.

      - lzma_alone_encoder(): Useful if you need to encode into the
        legacy LZMA_Alone format.

      - lzma_auto_decoder(): Decoder that automatically detects the
        file format; recommended when you decode compressed files on
        disk, because this way compatibility with the legacy LZMA_Alone
        format is transparent.

      - lzma_stream_decoder(): Decoder for Single- and Multi-Block
        Streams; this is good if you want to accept only .lzma Streams.


Filters

    liblzma supports multiple filters (algorithm implementations). The new
    .lzma format supports filter-chain having up to seven filters. In the
    filter chain, the output of one filter is input of the next filter in
    the chain. The legacy LZMA_Alone format supports only one filter, and
    that must always be LZMA.

        General-purporse compression:

            LZMA        The main algorithm of liblzma (surprise!)

        Branch/Call/Jump filters for executables:

            x86         This filter is known as BCJ in 7-Zip
            IA64        IA-64 (Itanium)
            PowerPC     Big endian PowerPC
            ARM
            ARM-Thumb
            SPARC

        Other filters:

            Copy        Dummy filter that simply copies all the data
                        from input to output.

            Subblock    Multi-purporse filter, that can
                          - embed End of Payload Marker if the previous
                            filter in the chain doesn't support it; and
                          - apply Subfilters, which filter only part
                            of the same compressed Block in the Stream.

    Branch/Call/Jump filters never change the size of the data. They
    should usually be used as a pre-filter for some compression filter
    like LZMA.


Integrity checks

    The .lzma Stream format uses CRC32 as the integrity check for
    different file format headers. It is possible to omit CRC32 from
    the Block Headers, but not from Stream Header. This is the reason
    why CRC32 code cannot be disabled when building liblzma (in addition,
    the LZMA encoder uses CRC32 for hashing, so that's another reason).

    The integrity check of the actual data is calculated from the
    uncompressed data. This check can be CRC32, CRC64, or SHA256.
    It can also be omitted completely, although that usually is not
    a good thing to do. There are free IDs left, so support for new
    checks algorithms can be added later.


API and ABI stability

    The API and ABI of liblzma isn't stable yet, although no huge
    changes should happen. One potential place for change is the
    lzma_options_subblock structure.

    In the 4.42.0alpha phase, the shared library version number won't
    be updated even if ABI breaks. I don't want to track the ABI changes
    yet. Just rebuild everything when you upgrade liblzma until we get
    to the beta stage.


Size of the library

    While liblzma isn't huge, it is quite far from the smallest possible
    LZMA implementation: full liblzma binary (with support for all
    filters and other features) is way over 100 KiB, but the plain raw
    LZMA decoder is only 5-10 KiB.

    To decrease the size of the library, you can omit parts of the library
    by passing certain options to the `configure' script. Disabling
    everything but the decoders of the require filters will usually give
    you a small enough library, but if you need a decoder for example
    embedded in the operating system kernel, the code from liblzma probably
    isn't suitable as is.

    If you need a minimal implementation supporting .lzma Streams, you
    may need to do partial rewrite. liblzma uses stateful API like zlib.
    That increases the size of the library. Using callback API or even
    simpler buffer-to-buffer API would allow smaller implementation.

    LZMA SDK contains smaller LZMA decoder written in ANSI-C than
    liblzma, so you may want to take a look at that code. However,
    it doesn't (at least not yet) support the new .lzma Stream format.


Documentation

    There's no other documentation than the public headers and this
    text yet. Real docs will be written some day, I hope.