liblzma: Add EROFS LZMA encoder and decoder.

Right now this is just a planned extra-compact format for use in the EROFS file system in Linux. At this point it's possible that the format will either change or be abandoned and removed completely. The special thing about the encoder is that it uses the output-size-limited encoding added in the previous commit. EROFS uses fixed-sized blocks (e.g. 4 KiB) to hold compressed data so the compressors must be able to create valid streams that fill the given block size.
author: Lasse Collin <lasse.collin@tukaani.org> 2021-01-14 20:07:01 +0200
committer: Lasse Collin <lasse.collin@tukaani.org> 2021-01-14 20:10:59 +0200
commit: 601ec0311e769fc704daaaa7dac0ca840aff080e (patch)
tree: ec13c2c53062e7fa6ec8210380c0efc97c8cd3f7 /src/liblzma/api
parent: liblzma: Add rough support for output-size-limited encoding in LZMA1. (diff)
download: xz-601ec0311e769fc704daaaa7dac0ca840aff080e.tar.xz
1 files changed, 76 insertions, 0 deletions
diff --git a/src/liblzma/api/lzma/container.h b/src/liblzma/api/lzma/container.h
index 9fbf4df0..581f3507 100644
--- a/src/liblzma/api/lzma/container.h
+++ b/src/liblzma/api/lzma/container.h
@@ -444,6 +444,55 @@ extern LZMA_API(lzma_ret) lzma_stream_buffer_encode(
 		lzma_nothrow lzma_attr_warn_unused_result;
 
 
+/**
+ * \brief       EROFS LZMA encoder
+ *
+ * The EROFS LZMA format is a raw LZMA stream whose first byte (always 0x00)
+ * has been replaced with bitwise-negation of the LZMA properties (lc/lp/pb).
+ * This encoding ensures that the first byte of EROFS LZMA stream is never
+ * 0x00. There is no end of payload marker and thus the uncompressed size
+ * must be stored separately. For the best error detection the dictionary
+ * size should be stored separately as well but alternatively one may use
+ * the uncompressed size as the dictionary size when decoding.
+ *
+ * With the EROFS LZMA encoder, lzma_code() behaves slightly unusually.
+ * The action argument must be LZMA_FINISH and the return value cannot be
+ * LZMA_OK. Thus the encoding is always done with a single lzma_code() after
+ * the initialization. The benefit of the combination of initialization
+ * function and lzma_code() is that memory allocations can be re-used for
+ * better performance.
+ *
+ * lzma_code() will try to encode as much input as is possible to fit into
+ * the given output buffer. If not all input can be encoded, the stream will
+ * be finished without encoding all the input. The caller must check both
+ * input and output buffer usage after lzma_code() (total_in and total_out
+ * in lzma_stream can be convenient). Often lzma_code() can fill the output
+ * buffer completely if there is a lot of input, but sometimes a few bytes
+ * may remain unused because the next LZMA symbol would require more space.
+ *
+ * lzma_stream.avail_out must be at least 6. Otherwise LZMA_PROG_ERROR
+ * will be returned.
+ *
+ * The LZMA dictionary should be reasonably low to speed up the encoder
+ * re-initialization. A good value is bigger than the resulting
+ * uncompressed size of most of the output chunks. For example, if output
+ * size is 4 KiB, dictionary size of 32 KiB or 64 KiB is good. If the
+ * data compresses extremely well, even 128 KiB may be useful.
+ *
+ * \return      - LZMA_STREAM_END: All good. Check the amounts of input used
+ *                and output produced. Store the amount of input used
+ *                (uncompressed size) as it needs to be known to decompress
+ *                the data.
+ *              - LZMA_OPTIONS_ERROR
+ *              - LZMA_MEM_ERROR
+ *              - LZMA_PROG_ERROR: In addition to the generic reasons for this
+ *                error code, this may also be returned if there isn't enough
+ *                output space (6 bytes) to create a valid EROFS LZMA stream.
+ */
+extern LZMA_API(lzma_ret) lzma_erofs_encoder(
+		lzma_stream *strm, const lzma_options_lzma *options);
+
+
 /************
  * Decoding *
  ************/
@@ -630,3 +679,30 @@ extern LZMA_API(lzma_ret) lzma_stream_buffer_decode(
 		const uint8_t *in, size_t *in_pos, size_t in_size,
 		uint8_t *out, size_t *out_pos, size_t out_size)
 		lzma_nothrow lzma_attr_warn_unused_result;
+
+
+/**
+ * \brief       EROFS LZMA decoder
+ *
+ * See lzma_erofs_decoder() for more information.
+ *
+ * The lzma_code() usage with this decoder is completely normal.
+ * The special behavior of lzma_code() applies to lzma_erofs_encoder() only.
+ *
+ * \param       strm        Pointer to properly prepared lzma_stream
+ * \param       uncomp_size Uncompressed size of the EROFS LZMA stream.
+ *                          The caller must somehow know this. Note that
+ *                          while the EROFS LZMA decoder in XZ Embedded needs
+ *                          also the compressed size, the implementation in
+ *                          liblzma doesn't need to know the compressed size.
+ * \param       dict_size   LZMA dictionary size that was used when
+ *                          compressing the data. It is OK to use a bigger
+ *                          value too but liblzma will then allocate more
+ *                          memory than would actually be required and error
+ *                          detection will be slightly worse. (Note that with
+ *                          the implementation in XZ Embedded it doesn't
+ *                          affect the memory usage if one specifies bigger
+ *                          dictionary than actually required.)
+ */
+extern LZMA_API(lzma_ret) lzma_erofs_decoder(
+		lzma_stream *strm, uint64_t uncomp_size, uint32_t dict_size);
author	Lasse Collin <lasse.collin@tukaani.org>	2021-01-14 20:07:01 +0200
committer	Lasse Collin <lasse.collin@tukaani.org>	2021-01-14 20:10:59 +0200
commit	601ec0311e769fc704daaaa7dac0ca840aff080e (patch)
tree	ec13c2c53062e7fa6ec8210380c0efc97c8cd3f7 /src/liblzma/api
parent	liblzma: Add rough support for output-size-limited encoding in LZMA1. (diff)
download	xz-601ec0311e769fc704daaaa7dac0ca840aff080e.tar.xz