aboutsummaryrefslogtreecommitdiff
path: root/doc/liblzma-security.txt
diff options
context:
space:
mode:
Diffstat (limited to 'doc/liblzma-security.txt')
-rw-r--r--doc/liblzma-security.txt219
1 files changed, 219 insertions, 0 deletions
diff --git a/doc/liblzma-security.txt b/doc/liblzma-security.txt
new file mode 100644
index 00000000..487637ed
--- /dev/null
+++ b/doc/liblzma-security.txt
@@ -0,0 +1,219 @@
+
+Using liblzma securely
+----------------------
+
+0. Introduction
+
+ This document discusses how to use liblzma securely. There are issues
+ that don't apply to zlib or libbzip2, so reading this document is
+ strongly recommended even for those who are very familiar with zlib
+ or libbzip2.
+
+ While making liblzma itself as secure as possible is essential, it's
+ out of scope of this document.
+
+
+1. Memory usage
+
+ The memory usage of liblzma varies a lot.
+
+
+1.1. Problem sources
+
+1.1.1. Block coder
+
+ The memory requirements of Block encoder depend on the used filters
+ and their settings. The memory requirements of the Block decoder
+ depend on the which filters and with which filter settings the Block
+ was encoded. Usually the memory requirements of a decoder are equal
+ or less than the requirements of the encoder with the same settings.
+
+ While the typical memory requirements to decode a Block is from a few
+ hundred kilobytes to tens of megabytes, a maliciously constructed
+ files can require a lot more RAM to decode. With the current filters,
+ the maximum amount is about 7 GiB. If you use multi-threaded decoding,
+ every Block can require this amount of RAM, thus a four-threaded
+ decoder could suddenly try to allocate 28 GiB of RAM.
+
+ If you don't limit the maximum memory usage in any way, and there are
+ no resource limits set on the operating system side, one malicious
+ input file can run the system out of memory, or at least make it swap
+ badly for a long time. This is exceptionally bad on servers e.g.
+ email server doing virus scanning on incoming messages.
+
+
+1.1.2. Metadata decoder
+
+ Multi-Block .lzma files contain at least one Metadata Block.
+ Externally the Metadata Blocks are similar to Data Blocks, so all
+ the issues mentioned about memory usage of Data Blocks applies to
+ Metadata Blocks too.
+
+ The uncompressed content of Metadata Blocks contain information about
+ the Stream as a whole, and optionally some Extra Records. The
+ information about the Stream is kept in liblzma's internal data
+ structures in RAM. Extra Records can contain arbitrary data. They are
+ not interpreted by liblzma, but liblzma will provide them to the
+ application in uninterpreted form if the application wishes so.
+
+ Usually the Uncompressed Size of a Metadata Block is small. Even on
+ extreme cases, it shouldn't be much bigger than a few megabytes. Once
+ the Metadata has been parsed into native data structures in liblzma,
+ it usually takes a little more memory than in the encoded form. For
+ all normal files, this is no problem, since the resulting memory usage
+ won't be too much.
+
+ The problem is that a maliciously constructed Metadata Block can
+ contain huge amount of "information", which liblzma will try to store
+ in its internal data structures. This may cause liblzma to allocate
+ all the available RAM unless some kind of resource usage limits are
+ applied.
+
+ Note that the Extra Records in Metadata are always parsed but, but
+ memory is allocated for them only if the application has requested
+ liblzma to provide the Extra Records to the application.
+
+
+1.2. Solutions
+
+ If you need to decode files from untrusted sources (most people do),
+ you must limit the memory usage to avoid denial of service (DoS)
+ conditions caused by malicious input files.
+
+ The first step is to find out how much memory you are allowed consume
+ at maximum. This may be a hardcoded constant or derived from the
+ available RAM; whatever is appropriate in the application.
+
+ The simplest solution is to use setrlimit() if the kernel supports
+ RLIMIT_AS, which limits the memory usage of the whole process.
+ For more portable and fine-grained limitting, you can use
+ memory limitter functions found from <lzma/memlimit.h>.
+
+
+1.2.1. Encoder
+
+ lzma_memory_usage() will give you a rough estimate about the memory
+ usage of the given filter chain. To dramatically simplify the internal
+ implementation, this function doesn't take into account all the small
+ helper data structures needed in various places; only the structures
+ with significant memory usage are taken into account. Still, the
+ accuracy of this function should be well within a mebibyte.
+
+ The Subblock filter is a special case. If a Subfilter has been
+ specified, it isn't taken into account when lzma_memory_usage()
+ calculates the memory usage. You need to calculate the memory usage
+ of the Subfilter separately.
+
+ Keeping track of Blocks in a Multi-Block Stream takes a few dozen
+ bytes of RAM per Block (size of the lzma_index structure plus overhead
+ of malloc()). It isn't a good idea to put tens of thousands of Blocks
+ into a Stream unless you have a very good reason to do so (compressed
+ dictionary could be an example of such situation).
+
+ Also keep the number and sizes of Extra Records sane. If you produce
+ the list of Extra Records automatically from some untrusted source,
+ you should not only validate the content of these Records, but also
+ their memory usage.
+
+
+1.2.2. Decoder
+
+ A single-threaded decoder should simply use a memory limitter and
+ indicate an error if it runs out of memory.
+
+ Memory-limitting with multi-threaded decoding is tricky. The simple
+ solution is to divide the maximum allowed memory usage with the
+ maximum allowed threads, and give each Block decoder their own
+ independent lzma_memory_limitter. The drawback is that if one Block
+ needs notably more RAM than any other Block, the decoder will run out
+ of memory when in reality there would be plenty of free RAM.
+
+ An attractive alternative would be using shared lzma_memory_limitter.
+ Depending on the application and the expected type of input, this may
+ either be the best solution or a source of hard-to-repeat problems.
+ Consider the following requirements:
+ - You use at maximum of n threads.
+ - x(i) is the decoder memory requirements of the Block number i
+ in an expected input Stream.
+ - The memory limitter is set to higher value than the sum of n
+ highest values x(i).
+
+ (If you are better at explaining the above conditions, please
+ contribute your improved version.)
+
+ If the above conditions aren't met, it is possible that the decoding
+ will fail unpredictably. That is, on the same machine using the same
+ settings, the decoding may sometimes succeed and sometimes fail. This
+ is because sometimes threads may run so that the Blocks with highest
+ memory usage are tried to be decoded at the same time.
+
+ Most .lzma files have all the Blocks encoded with identical settings,
+ or at least the memory usage won't vary dramatically. That's why most
+ multi-threaded decoders probably want to use the simple "separate
+ lzma_memory_limitter for each thread" solution, possibly fallbacking
+ to single-threaded mode in case the per-thread memory limits aren't
+ enough in multi-threaded mode.
+
+FIXME: Memory usage of Stream info.
+
+[
+
+]
+
+
+2. Huge uncompressed output
+
+2.1. Data Blocks
+
+ Decoding a tiny .lzma file can produce huge amount of uncompressed
+ output. There is an example file of 45 bytes, which decodes to 64 PiB
+ (that's 2^56 bytes). Uncompressing such a file to disk is likely to
+ fill even a bigger disk array. If the data is written to a pipe, it
+ may not fill the disk, but would still take very long time to finish.
+
+ To avoid denial of service conditions caused by huge amount of
+ uncompressed output, applications using liblzma should use some method
+ to limit the amount of output produced. The exact method depends on
+ the application.
+
+ All valid .lzma Streams make it possible to find out the uncompressed
+ size of the Stream without actually uncompressing the data. This
+ information is available in at least one of the Metadata Blocks.
+ Once the uncompressed size is parsed, the decoder can verify that
+ it doesn't exceed certain limits (e.g. available disk space).
+
+ When the uncompressed size is known, the decoder can actively keep
+ track of the amount of output produced so far, and that it doesn't
+ exceed the known uncompressed size. If it does exceed, the file is
+ known to be corrupt and an error should be indicated without
+ continuing to decode the rest of the file.
+
+ Unfortunately, finding the uncompressed size beforehand is often
+ possible only in non-streamed mode, because the needed information
+ could be in the Footer Metdata Block, which (obviously) is at the
+ end of the Stream. In purely streamed mode decoding, one may need to
+ use some rough arbitrary limits to prevent the problems described in
+ the beginning of this section.
+
+
+2.2. Metadata
+
+ Metadata is stored in Metadata Blocks, which are very similar to
+ Data Blocks. Thus, the uncompressed size can be huge just like with
+ Data Blocks. The difference is, that the contents of Metadata Blocks
+ aren't given to the application as is, but parsed by liblzma. Still,
+ reading through a huge Metadata can take very long time, effectively
+ creating a denial of service like piping decoded a Data Block to
+ another process would do.
+
+ At first it would seem that using a memory limitter would prevent
+ this issue as a side effect. But it does so only if the application
+ requests liblzma to allocate the Extra Records and provide them to
+ the application. If Extra Records aren't requested, they aren't
+ allocated either. Still, the Extra Records are being read through
+ to validate that the Metadata is in proper format.
+
+ The solution is to limit the Uncompressed Size of a Metadata Block
+ to some relatively large value. This will make liblzma to give an
+ error when the given limit is reached.
+