diff options
Diffstat (limited to 'doc/liblzma-security.txt')
-rw-r--r-- | doc/liblzma-security.txt | 219 |
1 files changed, 219 insertions, 0 deletions
diff --git a/doc/liblzma-security.txt b/doc/liblzma-security.txt new file mode 100644 index 00000000..487637ed --- /dev/null +++ b/doc/liblzma-security.txt @@ -0,0 +1,219 @@ + +Using liblzma securely +---------------------- + +0. Introduction + + This document discusses how to use liblzma securely. There are issues + that don't apply to zlib or libbzip2, so reading this document is + strongly recommended even for those who are very familiar with zlib + or libbzip2. + + While making liblzma itself as secure as possible is essential, it's + out of scope of this document. + + +1. Memory usage + + The memory usage of liblzma varies a lot. + + +1.1. Problem sources + +1.1.1. Block coder + + The memory requirements of Block encoder depend on the used filters + and their settings. The memory requirements of the Block decoder + depend on the which filters and with which filter settings the Block + was encoded. Usually the memory requirements of a decoder are equal + or less than the requirements of the encoder with the same settings. + + While the typical memory requirements to decode a Block is from a few + hundred kilobytes to tens of megabytes, a maliciously constructed + files can require a lot more RAM to decode. With the current filters, + the maximum amount is about 7 GiB. If you use multi-threaded decoding, + every Block can require this amount of RAM, thus a four-threaded + decoder could suddenly try to allocate 28 GiB of RAM. + + If you don't limit the maximum memory usage in any way, and there are + no resource limits set on the operating system side, one malicious + input file can run the system out of memory, or at least make it swap + badly for a long time. This is exceptionally bad on servers e.g. + email server doing virus scanning on incoming messages. + + +1.1.2. Metadata decoder + + Multi-Block .lzma files contain at least one Metadata Block. + Externally the Metadata Blocks are similar to Data Blocks, so all + the issues mentioned about memory usage of Data Blocks applies to + Metadata Blocks too. + + The uncompressed content of Metadata Blocks contain information about + the Stream as a whole, and optionally some Extra Records. The + information about the Stream is kept in liblzma's internal data + structures in RAM. Extra Records can contain arbitrary data. They are + not interpreted by liblzma, but liblzma will provide them to the + application in uninterpreted form if the application wishes so. + + Usually the Uncompressed Size of a Metadata Block is small. Even on + extreme cases, it shouldn't be much bigger than a few megabytes. Once + the Metadata has been parsed into native data structures in liblzma, + it usually takes a little more memory than in the encoded form. For + all normal files, this is no problem, since the resulting memory usage + won't be too much. + + The problem is that a maliciously constructed Metadata Block can + contain huge amount of "information", which liblzma will try to store + in its internal data structures. This may cause liblzma to allocate + all the available RAM unless some kind of resource usage limits are + applied. + + Note that the Extra Records in Metadata are always parsed but, but + memory is allocated for them only if the application has requested + liblzma to provide the Extra Records to the application. + + +1.2. Solutions + + If you need to decode files from untrusted sources (most people do), + you must limit the memory usage to avoid denial of service (DoS) + conditions caused by malicious input files. + + The first step is to find out how much memory you are allowed consume + at maximum. This may be a hardcoded constant or derived from the + available RAM; whatever is appropriate in the application. + + The simplest solution is to use setrlimit() if the kernel supports + RLIMIT_AS, which limits the memory usage of the whole process. + For more portable and fine-grained limitting, you can use + memory limitter functions found from <lzma/memlimit.h>. + + +1.2.1. Encoder + + lzma_memory_usage() will give you a rough estimate about the memory + usage of the given filter chain. To dramatically simplify the internal + implementation, this function doesn't take into account all the small + helper data structures needed in various places; only the structures + with significant memory usage are taken into account. Still, the + accuracy of this function should be well within a mebibyte. + + The Subblock filter is a special case. If a Subfilter has been + specified, it isn't taken into account when lzma_memory_usage() + calculates the memory usage. You need to calculate the memory usage + of the Subfilter separately. + + Keeping track of Blocks in a Multi-Block Stream takes a few dozen + bytes of RAM per Block (size of the lzma_index structure plus overhead + of malloc()). It isn't a good idea to put tens of thousands of Blocks + into a Stream unless you have a very good reason to do so (compressed + dictionary could be an example of such situation). + + Also keep the number and sizes of Extra Records sane. If you produce + the list of Extra Records automatically from some untrusted source, + you should not only validate the content of these Records, but also + their memory usage. + + +1.2.2. Decoder + + A single-threaded decoder should simply use a memory limitter and + indicate an error if it runs out of memory. + + Memory-limitting with multi-threaded decoding is tricky. The simple + solution is to divide the maximum allowed memory usage with the + maximum allowed threads, and give each Block decoder their own + independent lzma_memory_limitter. The drawback is that if one Block + needs notably more RAM than any other Block, the decoder will run out + of memory when in reality there would be plenty of free RAM. + + An attractive alternative would be using shared lzma_memory_limitter. + Depending on the application and the expected type of input, this may + either be the best solution or a source of hard-to-repeat problems. + Consider the following requirements: + - You use at maximum of n threads. + - x(i) is the decoder memory requirements of the Block number i + in an expected input Stream. + - The memory limitter is set to higher value than the sum of n + highest values x(i). + + (If you are better at explaining the above conditions, please + contribute your improved version.) + + If the above conditions aren't met, it is possible that the decoding + will fail unpredictably. That is, on the same machine using the same + settings, the decoding may sometimes succeed and sometimes fail. This + is because sometimes threads may run so that the Blocks with highest + memory usage are tried to be decoded at the same time. + + Most .lzma files have all the Blocks encoded with identical settings, + or at least the memory usage won't vary dramatically. That's why most + multi-threaded decoders probably want to use the simple "separate + lzma_memory_limitter for each thread" solution, possibly fallbacking + to single-threaded mode in case the per-thread memory limits aren't + enough in multi-threaded mode. + +FIXME: Memory usage of Stream info. + +[ + +] + + +2. Huge uncompressed output + +2.1. Data Blocks + + Decoding a tiny .lzma file can produce huge amount of uncompressed + output. There is an example file of 45 bytes, which decodes to 64 PiB + (that's 2^56 bytes). Uncompressing such a file to disk is likely to + fill even a bigger disk array. If the data is written to a pipe, it + may not fill the disk, but would still take very long time to finish. + + To avoid denial of service conditions caused by huge amount of + uncompressed output, applications using liblzma should use some method + to limit the amount of output produced. The exact method depends on + the application. + + All valid .lzma Streams make it possible to find out the uncompressed + size of the Stream without actually uncompressing the data. This + information is available in at least one of the Metadata Blocks. + Once the uncompressed size is parsed, the decoder can verify that + it doesn't exceed certain limits (e.g. available disk space). + + When the uncompressed size is known, the decoder can actively keep + track of the amount of output produced so far, and that it doesn't + exceed the known uncompressed size. If it does exceed, the file is + known to be corrupt and an error should be indicated without + continuing to decode the rest of the file. + + Unfortunately, finding the uncompressed size beforehand is often + possible only in non-streamed mode, because the needed information + could be in the Footer Metdata Block, which (obviously) is at the + end of the Stream. In purely streamed mode decoding, one may need to + use some rough arbitrary limits to prevent the problems described in + the beginning of this section. + + +2.2. Metadata + + Metadata is stored in Metadata Blocks, which are very similar to + Data Blocks. Thus, the uncompressed size can be huge just like with + Data Blocks. The difference is, that the contents of Metadata Blocks + aren't given to the application as is, but parsed by liblzma. Still, + reading through a huge Metadata can take very long time, effectively + creating a denial of service like piping decoded a Data Block to + another process would do. + + At first it would seem that using a memory limitter would prevent + this issue as a side effect. But it does so only if the application + requests liblzma to allocate the Extra Records and provide them to + the application. If Extra Records aren't requested, they aren't + allocated either. Still, the Extra Records are being read through + to validate that the Metadata is in proper format. + + The solution is to limit the Uncompressed Size of a Metadata Block + to some relatively large value. This will make liblzma to give an + error when the given limit is reached. + |