aboutsummaryrefslogtreecommitdiff
path: root/doc/liblzma-hacking.txt
diff options
context:
space:
mode:
authorLasse Collin <lasse.collin@tukaani.org>2007-12-09 00:42:33 +0200
committerLasse Collin <lasse.collin@tukaani.org>2007-12-09 00:42:33 +0200
commit5d018dc03549c1ee4958364712fb0c94e1bf2741 (patch)
tree1b211911fb33fddb3f04b77f99e81df23623ffc4 /doc/liblzma-hacking.txt
downloadxz-5d018dc03549c1ee4958364712fb0c94e1bf2741.tar.xz
Imported to git.
Diffstat (limited to 'doc/liblzma-hacking.txt')
-rw-r--r--doc/liblzma-hacking.txt112
1 files changed, 112 insertions, 0 deletions
diff --git a/doc/liblzma-hacking.txt b/doc/liblzma-hacking.txt
new file mode 100644
index 00000000..64390bcb
--- /dev/null
+++ b/doc/liblzma-hacking.txt
@@ -0,0 +1,112 @@
+
+Hacking liblzma
+---------------
+
+0. Preface
+
+ This document gives some overall information about the internals of
+ liblzma, which should make it easier to start reading and modifying
+ the code.
+
+
+1. Programming language
+
+ liblzma was written in C99. If you use GCC, this means that you need
+ at least GCC 3.x.x. GCC 2 isn't and won't be supported.
+
+ Some GCC-specific extensions are used *conditionally*. They aren't
+ required to build a full-featured library. Don't make the code rely
+ on any non-standard compiler extensions or even C99 features that
+ aren't portable between almost-C99 compatible compilers (for example
+ non-static inlines).
+
+ The public API headers are in C89. This is to avoid frustrating those
+ who maintain programs, which are strictly in C89 or C++.
+
+ An assumption about sizeof(size_t) is made. If this assumption is
+ wrong, some porting is probably needed:
+
+ sizeof(uint32_t) <= sizeof(size_t) <= sizeof(uint64_t)
+
+
+2. Internal vs. external API
+
+
+
+ Input Output
+ v Application ^
+ | liblzma public API |
+ | Stream coder |
+ | Block coder |
+ | Filter coder |
+ | ... |
+ v Filter coder ^
+
+
+ Application
+ `-- liblzma public API
+ `-- Stream coder
+ |-- Stream info handler
+ |-- Stream Header coder
+ |-- Block Header coder
+ | `-- Filter Flags coder
+ |-- Metadata coder
+ | `-- Block coder
+ | `-- Filter 0
+ | `-- Filter 1
+ | ...
+ |-- Data Block coder
+ | `-- Filter 0
+ | `-- Filter 1
+ | ...
+ `-- Stream tail coder
+
+
+
+x. Designing new filters
+
+ All filters must be designed so that the decoder cannot consume
+ arbitrary amount input without producing any decoded output. Failing
+ to follow this rule makes liblzma vulnerable to DoS attacks if
+ untrusted files are decoded (usually they are untrusted).
+
+ An example should clarify the reason behind this requirement: There
+ are two filters in the chain. The decoder of the first filter produces
+ huge amount of output (many gigabytes or more) with a few bytes of
+ input, which gets passed to the decoder of the second filter. If the
+ data passed to the second filter is interpreted as something that
+ produces no output (e.g. padding), the filter chain as a whole
+ produces no output and consumes no input for a long period of time.
+
+ The above problem was present in the first versions of the Subblock
+ filter. A tiny .lzma file could have taken several years to decode
+ while it wouldn't produce any output at all. The problem was fixed
+ by adding limits for number of consecutive Padding bytes, and requiring
+ that some decoded output must be produced between Set Subfilter and
+ Unset Subfilter.
+
+
+x. Implementing new filters
+
+ If the filter supports embedding End of Payload Marker, make sure that
+ when your filter detects End of Payload Marker,
+ - the usage of End of Payload Marker is actually allowed (i.e. End
+ of Input isn't used); and
+ - it also checks that there is no more input coming from the next
+ filter in the chain.
+
+ The second requirement is slightly tricky. It's possible that the next
+ filter hasn't returned LZMA_STREAM_END yet. It may even need a few
+ bytes more input before it will do so. You need to give it as much
+ input as it needs, and verify that it doesn't produce any output.
+
+ Don't call the next filter in the chain after it has returned
+ LZMA_STREAM_END (except in encoder if action == LZMA_SYNC_FLUSH).
+ It will result undefined behavior.
+
+ Be pedantic. If the input data isn't exactly valid, reject it.
+
+ At the moment, liblzma isn't modular. You will need to edit several
+ files in src/liblzma/common to include support for a new filter. grep
+ for LZMA_FILTER_LZMA to locate the files needing changes.
+