1 files changed, 112 insertions, 0 deletions
diff --git a/doc/liblzma-hacking.txt b/doc/liblzma-hacking.txt
new file mode 100644
index 00000000..64390bcb
--- /dev/null
+++ b/doc/liblzma-hacking.txt
@@ -0,0 +1,112 @@
+
+Hacking liblzma
+---------------
+
+0. Preface
+
+    This document gives some overall information about the internals of
+    liblzma, which should make it easier to start reading and modifying
+    the code.
+
+
+1. Programming language
+
+    liblzma was written in C99. If you use GCC, this means that you need
+    at least GCC 3.x.x. GCC 2 isn't and won't be supported.
+
+    Some GCC-specific extensions are used *conditionally*. They aren't
+    required to build a full-featured library. Don't make the code rely
+    on any non-standard compiler extensions or even C99 features that
+    aren't portable between almost-C99 compatible compilers (for example
+    non-static inlines).
+
+    The public API headers are in C89. This is to avoid frustrating those
+    who maintain programs, which are strictly in C89 or C++.
+
+    An assumption about sizeof(size_t) is made. If this assumption is
+    wrong, some porting is probably needed:
+
+        sizeof(uint32_t) <= sizeof(size_t) <= sizeof(uint64_t)
+
+
+2. Internal vs. external API
+
+
+
+        Input                         Output
+          v     Application             ^
+          |     liblzma public API      |
+          |     Stream coder            |
+          |     Block coder             |
+          |     Filter coder            |
+          |     ...                     |
+          v     Filter coder            ^
+
+
+        Application
+          `-- liblzma public API
+                `-- Stream coder
+                      |-- Stream info handler
+                      |-- Stream Header coder
+                      |-- Block Header coder
+                      |     `-- Filter Flags coder
+                      |-- Metadata coder
+                      |     `-- Block coder
+                      |           `-- Filter 0
+                      |                 `-- Filter 1
+                      |                     ...
+                      |-- Data Block coder
+                      |     `-- Filter 0
+                      |           `-- Filter 1
+                      |               ...
+                      `-- Stream tail coder
+
+
+
+x. Designing new filters
+
+    All filters must be designed so that the decoder cannot consume
+    arbitrary amount input without producing any decoded output. Failing
+    to follow this rule makes liblzma vulnerable to DoS attacks if
+    untrusted files are decoded (usually they are untrusted).
+
+    An example should clarify the reason behind this requirement: There
+    are two filters in the chain. The decoder of the first filter produces
+    huge amount of output (many gigabytes or more) with a few bytes of
+    input, which gets passed to the decoder of the second filter. If the
+    data passed to the second filter is interpreted as something that
+    produces no output (e.g. padding), the filter chain as a whole
+    produces no output and consumes no input for a long period of time.
+
+    The above problem was present in the first versions of the Subblock
+    filter. A tiny .lzma file could have taken several years to decode
+    while it wouldn't produce any output at all. The problem was fixed
+    by adding limits for number of consecutive Padding bytes, and requiring
+    that some decoded output must be produced between Set Subfilter and
+    Unset Subfilter.
+
+
+x. Implementing new filters
+
+    If the filter supports embedding End of Payload Marker, make sure that
+    when your filter detects End of Payload Marker,
+      - the usage of End of Payload Marker is actually allowed (i.e. End
+        of Input isn't used); and
+      - it also checks that there is no more input coming from the next
+        filter in the chain.
+
+    The second requirement is slightly tricky. It's possible that the next
+    filter hasn't returned LZMA_STREAM_END yet. It may even need a few
+    bytes more input before it will do so. You need to give it as much
+    input as it needs, and verify that it doesn't produce any output.
+
+    Don't call the next filter in the chain after it has returned
+    LZMA_STREAM_END (except in encoder if action == LZMA_SYNC_FLUSH).
+    It will result undefined behavior.
+
+    Be pedantic. If the input data isn't exactly valid, reject it.
+
+    At the moment, liblzma isn't modular. You will need to edit several
+    files in src/liblzma/common to include support for a new filter. grep
+    for LZMA_FILTER_LZMA to locate the files needing changes.
+