Remove docs that are too outdated to be updated

(rewrite will be better).
author: Lasse Collin <lasse.collin@tukaani.org> 2009-05-01 11:28:52 +0300
committer: Lasse Collin <lasse.collin@tukaani.org> 2009-05-01 11:28:52 +0300
commit: be06858d5cf8ba46557395035d821dc332f3f830 (patch)
tree: 603491cf2b789dd19afd7f3cc6185873f1a36cb8 /doc/liblzma-advanced.txt
parent: Added documentation about the legacy .lzma file format. (diff)
download: xz-be06858d5cf8ba46557395035d821dc332f3f830.tar.xz
1 files changed, 0 insertions, 324 deletions
diff --git a/doc/liblzma-advanced.txt b/doc/liblzma-advanced.txt
deleted file mode 100644
index 6e1c9834..00000000
--- a/doc/liblzma-advanced.txt
+++ /dev/null
@@ -1,324 +0,0 @@
-
-Advanced features of liblzma
-----------------------------
-
-0. Introduction
-
-    Most developers need only the basic features of liblzma. These
-    features allow single-threaded encoding and decoding of .lzma files
-    in streamed mode.
-
-    In some cases developers want more. The .lzma file format is
-    designed to allow multi-threaded encoding and decoding and limited
-    random-access reading. These features are possible in non-streamed
-    mode and limitedly also in streamed mode.
-
-    To take advange of these features, the application needs a custom
-    .lzma file format handler. liblzma provides a set of tools to ease
-    this task, but it's still quite a bit of work to get a good custom
-    .lzma handler done.
-
-
-1. Where to begin
-
-    Start by reading the .lzma file format specification. Understanding
-    the basics of the .lzma file structure is required to implement a
-    custom .lzma file handler and to understand the rest of this document.
-
-
-2. The basic components
-
-2.1. Stream Header and tail
-
-    Stream Header begins the .lzma Stream and Stream tail ends it. Stream
-    Header is defined in the file format specification, but Stream tail
-    isn't (thus I write "tail" with a lower-case letter). Stream tail is
-    simply the Stream Flags and the Footer Magic Bytes fields together.
-    It was done this way in liblzma, because the Block coders take care
-    of the rest of the stuff in the Stream Footer.
-
-    For now, the size of Stream Header is fixed to 11 bytes. The header
-    <lzma/stream_flags.h> defines LZMA_STREAM_HEADER_SIZE, which you
-    should use instead of a hardcoded number. Similarly, Stream tail
-    is fixed to 3 bytes, and there is a constant LZMA_STREAM_TAIL_SIZE.
-
-    It is possible, that a future version of the .lzma format will have
-    variable-sized Stream Header and tail. As of writing, this seems so
-    unlikely though, that it was considered simplest to just use a
-    constant instead of providing a functions to get and store the sizes
-    of the Stream Header and tail.
-
-
-2.x. Stream tail
-
-    For now, the size of Stream tail is fixed to 3 bytes. The header
-    <lzma/stream_flags.h> defines LZMA_STREAM_TAIL_SIZE, which you
-    should use instead of a hardcoded number.
-
-
-3. Keeping track of size information
-
-    The lzma_info_* functions found from <lzma/info.h> should ease the
-    task of keeping track of sizes of the Blocks and also the Stream
-    as a whole. Using these functions is strongly recommended, because
-    there are surprisingly many situations where an error can occur,
-    and these functions check for possible errors every time some new
-    information becomes available.
-
-    If you find lzma_info_* functions lacking something that you would
-    find useful, please contact the author.
-
-
-3.1. Start offset of the Stream
-
-    If you are storing the .lzma Stream inside anothe file format, or
-    for some other reason are placing the .lzma Stream to somewhere
-    else than to the beginning of the file, you should tell the starting
-    offset of the Stream using lzma_info_start_offset_set().
-
-    The start offset of the Stream is used for two distinct purporses.
-    First, knowing the start offset of the Stream allows
-    lzma_info_alignment_get() to correctly calculate the alignment of
-    every Block. This information is given to the Block encoder, which
-    will calculate the size of Header Padding so that Compressed Data
-    is alignment at an optimal offset.
-
-    Another use for start offset of the Stream is in random-access
-    reading. If you set the start offset of the Stream, lzma_info_locate()
-    will be able to calculate the offset relative to the beginning of the
-    file containing the Stream (instead of offset relative to the
-    beginning of the Stream).
-
-
-3.2. Size of Stream Header
-
-    While the size of Stream Header is constant (11 bytes) in the current
-    version of the .lzma file format, this may change in future.
-
-
-3.3. Size of Header Metadata Block
-
-    This information is needed when doing random-access reading, and
-    to verify the value of this field stored in Footer Metadata Block.
-
-
-3.4. Total Size of the Data Blocks
-
-
-3.5. Uncompressed Size of Data Blocks
-
-
-3.6. Index
-
-
-
-
-x. Alignment
-
-    There are a few slightly different types of alignment issues when
-    working with .lzma files.
-
-    The .lzma format doesn't strictly require any kind of alignment.
-    However, if the encoder carefully optimizes the alignment in all
-    situations, it can improve compression ratio, speed of the encoder
-    and decoder, and slightly help if the files get damaged and need
-    recovery.
-
-    Alignment has the most significant effect compression ratio FIXME
-
-
-x.1. Compression ratio
-
-    Some filters take advantage of the alignment of the input data.
-    To get the best compression ratio, make sure that you feed these
-    filters correctly aligned data.
-
-    Some filters (e.g. LZMA) don't necessarily mind too much if the
-    input doesn't match the preferred alignment. With these filters
-    the penalty in compression ratio depends on the specific type of
-    data being compressed.
-
-    Other filters (e.g. PowerPC executable filter) won't work at all
-    with data that is improperly aligned. While the data can still
-    be de-filtered back to its original form, the benefit of the
-    filtering (better compression ratio) is completely lost, because
-    these filters expect certain patterns at properly aligned offsets.
-    The compression ratio may even worse with incorrectly aligned input
-    than without the filter.
-
-
-x.1.1. Inter-filter alignment
-
-    When there are multiple filters chained, checking the alignment can
-    be useful not only with the input of the first filter and output of
-    the last filter, but also between the filters.
-
-    Inter-filter alignment important especially with the Subblock filter.
-
-
-x.1.2. Further compression with external tools
-
-    This is relatively rare situation in practice, but still worth
-    understanding.
-
-    Let's say that there are several SPARC executables, which are each
-    filtered to separate .lzma files using only the SPARC filter. If
-    Uncompressed Size is written to the Block Header, the size of Block
-    Header may vary between the .lzma files. If no Padding is used in
-    the Block Header to correct the alignment, the starting offset of
-    the Compressed Data field will be differently aligned in different
-    .lzma files.
-
-    All these .lzma files are archived into a single .tar archive. Due
-    to nature of the .tar format, every file is aligned inside the
-    archive to an offset that is a multiple of 512 bytes.
-
-    The .tar archive is compressed into a new .lzma file using the LZMA
-    filter with options, that prefer input alignment of four bytes. Now
-    if the independent .lzma files don't have the same alignment of
-    the Compressed Data fields, the LZMA filter will be unable to take
-    advantage of the input alignment between the files in the .tar
-    archive, which reduces compression ratio.
-
-    Thus, even if you have only single Block per file, it can be good for
-    compression ratio to align the Compressed Data to optimal offset.
-
-
-x.2. Speed
-
-    Most modern computers are faster when multi-byte data is located
-    at aligned offsets in RAM. Proper alignment of the Compressed Data
-    fields can slightly increase the speed of some filters.
-
-
-x.3. Recovery
-
-    Aligning every Block Header to start at an offset with big enough
-    alignment may ease or at least speed up recovery of broken files.
-
-
-y. Typical usage cases
-
-y.x. Parsing the Stream backwards
-
-    You may need to parse the Stream backwards if you need to get
-    information such as the sizes of the Stream, Index, or Extra.
-    The basic procedure to do this follows.
-
-    Locate the end of the Stream. If the Stream is stored as is in a
-    standalone .lzma file, simply seek to the end of the file and start
-    reading backwards using appropriate buffer size. The file format
-    specification allows arbitrary amount of Footer Padding (zero or more
-    NUL bytes), which you skip before trying to decode the Stream tail.
-
-    Once you have located the end of the Stream (a non-NULL byte), make
-    sure you have at least the last LZMA_STREAM_TAIL_SIZE bytes of the
-    Stream in a buffer. If there isn't enough bytes left from the file,
-    the file is too small to contain a valid Stream. Decode the Stream
-    tail using lzma_stream_tail_decoder(). Store the offset of the first
-    byte of the Stream tail; you will need it later.
-
-    You may now want to do some internal verifications e.g. if the Check
-    type is supported by the liblzma build you are using.
-
-    Decode the Backward Size field with lzma_vli_reverse_decode(). The
-    field is at maximum of LZMA_VLI_BYTES_MAX bytes long. Check that
-    Backward Size is not zero. Store the offset of the first byte of
-    the Backward Size; you will need it later.
-
-    Now you know the Total Size of the last Block of the Stream. It's the
-    value of Backward Size plus the size of the Backward Size field. Note
-    that you cannot use lzma_vli_size() to calculate the size since there
-    might be padding; you need to use the real observed size of the
-    Backward Size field.
-
-    At this point, the operation continues differently for Single-Block
-    and Multi-Block Streams.
-
-
-y.x.1. Single-Block Stream
-
-    There might be Uncompressed Size field present in the Stream Footer.
-    You cannot know it for sure unless you have already parsed the Block
-    Header earlier. For security reasons, you probably want to try to
-    decode the Uncompressed Size field, but you must not indicate any
-    error if decoding fails. Later you can give the decoded Uncompressed
-    Size to Block decoder if Uncopmressed Size isn't otherwise known;
-    this prevents it from producing too much output in case of (possibly
-    intentionally) corrupt file.
-
-    Calculate the start offset of the Stream:
-
-        backward_offset - backward_size - LZMA_STREAM_HEADER_SIZE
-
-    backward_offset is the offset of the first byte of the Backward Size
-    field. Remember to check for integer overflows, which can occur with
-    invalid input files.
-
-    Seek to the beginning of the Stream. Decode the Stream Header using
-    lzma_stream_header_decoder(). Verify that the decoded Stream Flags
-    match the values found from Stream tail. You can use the
-    lzma_stream_flags_is_equal() macro for this.
-
-    Decode the Block Header. Verify that it isn't a Metadata Block, since
-    Single-Block Streams cannot have Metadata. If Uncompressed Size is
-    present in the Block Header, the value you tried to decode from the
-    Stream Footer must be ignored, since Uncompressed Size wasn't actually
-    present there. If Block Header doesn't have Uncompressed Size, and
-    decoding the Uncompressed Size field from the Stream Footer failed,
-    the file is corrupt.
-
-    If you were only looking for the Uncompressed Size of the Stream,
-    you now got that information, and you can stop processing the Stream.
-
-    To decode the Block, the same instructions apply as described in
-    FIXME. However, because you have some extra known information decoded
-    from the Stream Footer, you should give this information to the Block
-    decoder so that it can verify it while decoding:
-      - If Uncompressed Size is not present in the Block Header, set
-        lzma_options_block.uncompressed_size to the value you decoded
-        from the Stream Footer.
-      - Always set lzma_options_block.total_size to backward_size +
-        size_of_backward_size (you calculated this sum earlier already).
-
-
-y.x.2. Multi-Block Stream
-
-    Calculate the start offset of the Footer Metadata Block:
-
-        backward_offset - backward_size
-
-    backward_offset is the offset of the first byte of the Backward Size
-    field. Remember to check for integer overflows, which can occur with
-    broken input files.
-
-    Decode the Block Header. Verify that it is a Metadata Block. Set
-    lzma_options_block.total_size to backward_size + size_of_backward_size
-    (you calculated this sum earlier already). Then decode the Footer
-    Metadata Block.
-
-    Store the decoded Footer Metadata to lzma_info structure using
-    lzma_info_set_metadata(). Set also the offset of the Backward Size
-    field using lzma_info_size_set(). Then you can get the start offset
-    of the Stream using lzma_info_size_get(). Note that any of these steps
-    may fail so don't omit error checking.
-
-    Seek to the beginning of the Stream. Decode the Stream Header using
-    lzma_stream_header_decoder(). Verify that the decoded Stream Flags
-    match the values found from Stream tail. You can use the
-    lzma_stream_flags_is_equal() macro for this.
-
-    If you were only looking for the Uncompressed Size of the Stream,
-    it's possible that you already have it now. If Uncompressed Size (or
-    whatever information you were looking for) isn't available yet,
-    continue by decoding also the Header Metadata Block. (If some
-    information is missing, the Header Metadata Block has to be present.)
-
-    Decoding the Data Blocks goes the same way as described in FIXME.
-
-
-y.x.3. Variations
-
-    If you know the offset of the beginning of the Stream, you may want
-    to parse the Stream Header before parsing the Stream tail.
-
author	Lasse Collin <lasse.collin@tukaani.org>	2009-05-01 11:28:52 +0300
committer	Lasse Collin <lasse.collin@tukaani.org>	2009-05-01 11:28:52 +0300
commit	be06858d5cf8ba46557395035d821dc332f3f830 (patch)
tree	603491cf2b789dd19afd7f3cc6185873f1a36cb8 /doc/liblzma-advanced.txt
parent	Added documentation about the legacy .lzma file format. (diff)
download	xz-be06858d5cf8ba46557395035d821dc332f3f830.tar.xz