diff options
Diffstat (limited to '')
-rw-r--r-- | doc/liblzma-security.txt | 219 |
1 files changed, 0 insertions, 219 deletions
diff --git a/doc/liblzma-security.txt b/doc/liblzma-security.txt deleted file mode 100644 index 55bc57bc..00000000 --- a/doc/liblzma-security.txt +++ /dev/null @@ -1,219 +0,0 @@ - -Using liblzma securely ----------------------- - -0. Introduction - - This document discusses how to use liblzma securely. There are issues - that don't apply to zlib or libbzip2, so reading this document is - strongly recommended even for those who are very familiar with zlib - or libbzip2. - - While making liblzma itself as secure as possible is essential, it's - out of scope of this document. - - -1. Memory usage - - The memory usage of liblzma varies a lot. - - -1.1. Problem sources - -1.1.1. Block coder - - The memory requirements of Block encoder depend on the used filters - and their settings. The memory requirements of the Block decoder - depend on the which filters and with which filter settings the Block - was encoded. Usually the memory requirements of a decoder are equal - or less than the requirements of the encoder with the same settings. - - While the typical memory requirements to decode a Block is from a few - hundred kilobytes to tens of megabytes, a maliciously constructed - files can require a lot more RAM to decode. With the current filters, - the maximum amount is about 7 GiB. If you use multi-threaded decoding, - every Block can require this amount of RAM, thus a four-threaded - decoder could suddenly try to allocate 28 GiB of RAM. - - If you don't limit the maximum memory usage in any way, and there are - no resource limits set on the operating system side, one malicious - input file can run the system out of memory, or at least make it swap - badly for a long time. This is exceptionally bad on servers e.g. - email server doing virus scanning on incoming messages. - - -1.1.2. Metadata decoder - - Multi-Block .lzma files contain at least one Metadata Block. - Externally the Metadata Blocks are similar to Data Blocks, so all - the issues mentioned about memory usage of Data Blocks applies to - Metadata Blocks too. - - The uncompressed content of Metadata Blocks contain information about - the Stream as a whole, and optionally some Extra Records. The - information about the Stream is kept in liblzma's internal data - structures in RAM. Extra Records can contain arbitrary data. They are - not interpreted by liblzma, but liblzma will provide them to the - application in uninterpreted form if the application wishes so. - - Usually the Uncompressed Size of a Metadata Block is small. Even on - extreme cases, it shouldn't be much bigger than a few megabytes. Once - the Metadata has been parsed into native data structures in liblzma, - it usually takes a little more memory than in the encoded form. For - all normal files, this is no problem, since the resulting memory usage - won't be too much. - - The problem is that a maliciously constructed Metadata Block can - contain huge amount of "information", which liblzma will try to store - in its internal data structures. This may cause liblzma to allocate - all the available RAM unless some kind of resource usage limits are - applied. - - Note that the Extra Records in Metadata are always parsed but, but - memory is allocated for them only if the application has requested - liblzma to provide the Extra Records to the application. - - -1.2. Solutions - - If you need to decode files from untrusted sources (most people do), - you must limit the memory usage to avoid denial of service (DoS) - conditions caused by malicious input files. - - The first step is to find out how much memory you are allowed consume - at maximum. This may be a hardcoded constant or derived from the - available RAM; whatever is appropriate in the application. - - The simplest solution is to use setrlimit() if the kernel supports - RLIMIT_AS, which limits the memory usage of the whole process. - For more portable and fine-grained limiting, you can use - memory limiter functions found from <lzma/memlimit.h>. - - -1.2.1. Encoder - - lzma_memory_usage() will give you a rough estimate about the memory - usage of the given filter chain. To dramatically simplify the internal - implementation, this function doesn't take into account all the small - helper data structures needed in various places; only the structures - with significant memory usage are taken into account. Still, the - accuracy of this function should be well within a mebibyte. - - The Subblock filter is a special case. If a Subfilter has been - specified, it isn't taken into account when lzma_memory_usage() - calculates the memory usage. You need to calculate the memory usage - of the Subfilter separately. - - Keeping track of Blocks in a Multi-Block Stream takes a few dozen - bytes of RAM per Block (size of the lzma_index structure plus overhead - of malloc()). It isn't a good idea to put tens of thousands of Blocks - into a Stream unless you have a very good reason to do so (compressed - dictionary could be an example of such situation). - - Also keep the number and sizes of Extra Records sane. If you produce - the list of Extra Records automatically from some untrusted source, - you should not only validate the content of these Records, but also - their memory usage. - - -1.2.2. Decoder - - A single-threaded decoder should simply use a memory limiter and - indicate an error if it runs out of memory. - - Memory-limiting with multi-threaded decoding is tricky. The simple - solution is to divide the maximum allowed memory usage with the - maximum allowed threads, and give each Block decoder their own - independent lzma_memory_limiter. The drawback is that if one Block - needs notably more RAM than any other Block, the decoder will run out - of memory when in reality there would be plenty of free RAM. - - An attractive alternative would be using shared lzma_memory_limiter. - Depending on the application and the expected type of input, this may - either be the best solution or a source of hard-to-repeat problems. - Consider the following requirements: - - You use a maximum of n threads. - - x(i) is the decoder memory requirements of the Block number i - in an expected input Stream. - - The memory limiter is set to higher value than the sum of n - highest values x(i). - - (If you are better at explaining the above conditions, please - contribute your improved version.) - - If the above conditions aren't met, it is possible that the decoding - will fail unpredictably. That is, on the same machine using the same - settings, the decoding may sometimes succeed and sometimes fail. This - is because sometimes threads may run so that the Blocks with highest - memory usage are tried to be decoded at the same time. - - Most .lzma files have all the Blocks encoded with identical settings, - or at least the memory usage won't vary dramatically. That's why most - multi-threaded decoders probably want to use the simple "separate - lzma_memory_limiter for each thread" solution, possibly falling back - to single-threaded mode in case the per-thread memory limits aren't - enough in multi-threaded mode. - -FIXME: Memory usage of Stream info. - -[ - -] - - -2. Huge uncompressed output - -2.1. Data Blocks - - Decoding a tiny .lzma file can produce huge amount of uncompressed - output. There is an example file of 45 bytes, which decodes to 64 PiB - (that's 2^56 bytes). Uncompressing such a file to disk is likely to - fill even a bigger disk array. If the data is written to a pipe, it - may not fill the disk, but would still take very long time to finish. - - To avoid denial of service conditions caused by huge amount of - uncompressed output, applications using liblzma should use some method - to limit the amount of output produced. The exact method depends on - the application. - - All valid .lzma Streams make it possible to find out the uncompressed - size of the Stream without actually uncompressing the data. This - information is available in at least one of the Metadata Blocks. - Once the uncompressed size is parsed, the decoder can verify that - it doesn't exceed certain limits (e.g. available disk space). - - When the uncompressed size is known, the decoder can actively keep - track of the amount of output produced so far, and that it doesn't - exceed the known uncompressed size. If it does exceed, the file is - known to be corrupt and an error should be indicated without - continuing to decode the rest of the file. - - Unfortunately, finding the uncompressed size beforehand is often - possible only in non-streamed mode, because the needed information - could be in the Footer Metdata Block, which (obviously) is at the - end of the Stream. In purely streamed mode decoding, one may need to - use some rough arbitrary limits to prevent the problems described in - the beginning of this section. - - -2.2. Metadata - - Metadata is stored in Metadata Blocks, which are very similar to - Data Blocks. Thus, the uncompressed size can be huge just like with - Data Blocks. The difference is, that the contents of Metadata Blocks - aren't given to the application as is, but parsed by liblzma. Still, - reading through a huge Metadata can take very long time, effectively - creating a denial of service like piping decoded a Data Block to - another process would do. - - At first it would seem that using a memory limiter would prevent - this issue as a side effect. But it does so only if the application - requests liblzma to allocate the Extra Records and provide them to - the application. If Extra Records aren't requested, they aren't - allocated either. Still, the Extra Records are being read through - to validate that the Metadata is in proper format. - - The solution is to limit the Uncompressed Size of a Metadata Block - to some relatively large value. This will make liblzma to give an - error when the given limit is reached. - |