aboutsummaryrefslogtreecommitdiff
path: root/src/liblzma/lz (follow)
AgeCommit message (Collapse)AuthorFilesLines
2024-02-14liblzma: LZ decoder: Add unlikely().Lasse Collin1-1/+1
2024-02-14liblzma: LZ decoder: Remove a useless unlikely().Lasse Collin1-1/+1
2024-02-14liblzma: Optimize LZ decoder slightly.Lasse Collin2-58/+86
Now extra buffer space is reserved so that repeating bytes for any single match will never need to copy from two places (both the beginning and the end of the buffer). This simplifies dict_repeat() and helps a little with speed. This seems to reduce .lzma decompression time about 2 %, so with .xz and CRC it could be slightly less. The small things add up still.
2024-02-14liblzma: Creates Non-resumable and Resumable modes for lzma_decoder.Jia Tan1-2/+12
The new decoder resumes the first decoder loop in the Resumable mode. Then, the code executes in Non-resumable mode until it detects that it cannot guarantee to have enough input/output to decode another symbol. The Resumable mode is how the decoder has always worked. Before decoding every input bit, it checks if there is enough space and will save its location to be resumed later. When the decoder has more input/output, it jumps back to the correct sequence in the Resumable mode code. When the input/output buffers are large, the Resumable mode is much slower than the Non-resumable because it has more branches and is harder for the compiler to optimize since it is in a large switch block. Early benchmarking shows significant time improvement (8-10% on gcc and clang x86) by using the Non-resumable code as much as possible.
2024-02-14liblzma: Include the SPDX license identifier 0BSD to generated files.Lasse Collin1-1/+3
Perhaps the generated files aren't even copyrightable but using the same license for them as for the rest of the liblzma keeps things more consistent for tools that look for license info.
2024-02-14Add SPDX license identifier into 0BSD source code files.Lasse Collin7-2/+13
2024-02-14Change most public domain parts to 0BSD.Lasse Collin7-21/+0
Translations and doc/xz-file-format.txt and doc/lzma-file-format.txt were not touched. COPYING.0BSD was added.
2023-12-20liblzma: Initialize lzma_lz_encoder pointers with NULL.Jia Tan1-1/+5
This fixes the recent change to lzma_lz_encoder that used memzero instead of the NULL constant. On some compilers the NULL constant (always 0) may not equal the NULL pointer (this only needs to guarentee to not point to valid memory address). Later code compares the pointers to the NULL pointer so we must initialize them with the NULL pointer instead of 0 to guarentee code correctness.
2023-12-16liblzma: Set all values in lzma_lz_encoder to NULL after allocation.Jia Tan1-3/+1
The first member of lzma_lz_encoder doesn't necessarily need to be set to NULL since it will always be set before anything tries to use it. However the function pointer members must be set to NULL since other functions rely on this NULL value to determine if this behavior is supported or not. This fixes a somewhat serious bug, where the options_update() and set_out_limit() function pointers are not set to NULL. This seems to have been forgotten since these function pointers were added many years after the original two (code() and end()). The problem is that by not setting this to NULL we are relying on the memory allocation to zero things out if lzma_filters_update() is called on a LZMA1 encoder. The function pointer for set_out_limit() is less serious because there is not an API function that could call this in an incorrect way. set_out_limit() is only called by the MicroLZMA encoder, which must use LZMA1 where set_out_limit() is always set. Its currently not possible to call set_out_limit() on an LZMA2 encoder at this time. So calling lzma_filters_update() on an LZMA1 encoder had undefined behavior since its possible that memory could be manipulated so the options_update member pointed to a different instruction sequence. This is unlikely to be a bug in an existing application since it relies on calling lzma_filters_update() on an LZMA1 encoder in the first place. For instance, it does not affect xz because lzma_filters_update() can only be used when encoding to the .xz format. This is fixed by using memzero() to set all members of lzma_lz_encoder to NULL after it is allocated. This ensures this mistake will not occur here in the future if any additional function pointers are added.
2023-12-16liblzma: Tweak a comment.Jia Tan1-1/+1
2023-11-09liblzma: Add missing comments to lz_encoder.h.Jia Tan1-1/+5
2023-10-30liblzma: Use lzma_attr_visibility_hidden on private extern declarations.Lasse Collin1-0/+1
These variables are internal to liblzma and not exposed in the API.
2023-09-24liblzma: Change quoting style from `...' to '...'.Jia Tan1-1/+1
This was done for both internal and API headers.
2023-07-31Docs: Fix typos found by codespellDimitri Papadopoulos Orfanos1-1/+1
2023-05-11liblzma: Creates IS_ENC_DICT_SIZE_VALID() macro.Jia Tan2-3/+9
This creates an internal liblzma macro to test if the dictionary size is valid for encoding.
2023-01-12liblzma: Silence another warning from -Wsign-conversion in a 32-bit build.Lasse Collin1-3/+4
It doesn't warn on a 64-bit system because truncating a ptrdiff_t (signed long) to uint32_t is diagnosed under -Wconversion by GCC and -Wshorten-64-to-32 by Clang.
2023-01-12Fix warnings from clang -Wdocumentation.Lasse Collin1-2/+2
2022-11-28liblzma: Remove lzma_lz_decoder_uncompressed() as it's now unused.Lasse Collin2-17/+0
2022-11-27liblzma: Pass the Filter ID to LZ encoder and decoder.Lasse Collin4-6/+10
This allows using two Filter IDs with the same initialization function and data structures.
2022-11-24liblzma: Allow nice_len 2 and 3 even if match finder requires 3 or 4.Lasse Collin2-5/+18
That is, if the specified nice_len is smaller than the minimum of the match finder, silently use the match finder's minimum value instead of reporting an error. The old behavior is annoying to users and it complicates xz options handling too.
2022-11-14liblzma: Use __attribute__((__constructor__)) if available.Lasse Collin1-1/+1
This uses it for CRC table initializations when using --disable-small. It avoids mythread_once() overhead. It also means that then --disable-small --disable-threads is thread-safe if this attribute is supported.
2022-07-25liblzma: Refactor lzma_mf_is_supported() to use a switch-statement.Jia Tan1-18/+14
2022-07-13liblzma: Add optional autodetection of LZMA end marker.Lasse Collin2-5/+13
Turns out that this is needed for .lzma files as the spec in LZMA SDK says that end marker may be present even if the size is stored in the header. Such files are rare but exist in the real world. The code in liblzma is so old that the spec didn't exist in LZMA SDK back then and I had understood that such files weren't possible (the lzma tool in LZMA SDK didn't create such files). This modifies the internal API so that LZMA decoder can be told if EOPM is allowed even when the uncompressed size is known. It's allowed with .lzma and not with other uses. Thanks to Karl Beldan for reporting the problem.
2021-01-14liblzma: Add rough support for output-size-limited encoding in LZMA1.Lasse Collin2-0/+20
With this it is possible to encode LZMA1 data without EOPM so that the encoder will encode as much input as it can without exceeding the specified output size limit. The resulting LZMA1 stream will be a normal LZMA1 stream without EOPM. The actual uncompressed size will be available to the caller via the uncomp_size pointer. One missing thing is that the LZMA layer doesn't inform the LZ layer when the encoding is finished and thus the LZ may read more input when it won't be used. However, this doesn't matter if encoding is done with a single call (which is the planned use case for now). For proper multi-call encoding this should be improved. This commit only adds the functionality for internal use. Nothing uses it yet.
2019-12-31Rename unaligned_read32ne to read32ne, and similarly for the others.Lasse Collin1-1/+1
2019-06-25liblzma: Fix a buggy comment.Lasse Collin1-1/+1
2019-06-24liblzma: Remove incorrect uses of lzma_attribute((__unused__)).Lasse Collin1-2/+1
Caught by clang -Wused-but-marked-unused.
2019-06-02liblzma: Fix one more unaligned read to use unaligned_read16ne().Lasse Collin1-1/+1
2019-05-13liblzma: Avoid memcpy(NULL, foo, 0) because it is undefined behavior.Lasse Collin1-3/+9
I should have always known this but I didn't. Here is an example as a reminder to myself: int mycopy(void *dest, void *src, size_t n) { memcpy(dest, src, n); return dest == NULL; } In the example, a compiler may assume that dest != NULL because passing NULL to memcpy() would be undefined behavior. Testing with GCC 8.2.1, mycopy(NULL, NULL, 0) returns 1 with -O0 and -O1. With -O2 the return value is 0 because the compiler infers that dest cannot be NULL because it was already used with memcpy() and thus the test for NULL gets optimized out. In liblzma, if a null-pointer was passed to memcpy(), there were no checks for NULL *after* the memcpy() call, so I cautiously suspect that it shouldn't have caused bad behavior in practice, but it's hard to be sure, and the problematic cases had to be fixed anyway. Thanks to Jeffrey Walton.
2019-05-11spellingAntoine Cœur1-1/+1
2016-11-21liblzma: Avoid multiple definitions of lzma_coder structures.Lasse Collin4-64/+75
Only one definition was visible in a translation unit. It avoided a few casts and temp variables but seems that this hack doesn't work with link-time optimizations in compilers as it's not C99/C11 compliant. Fixes: http://www.mail-archive.com/xz-devel@tukaani.org/msg00279.html
2015-11-04liblzma: Make Valgrind happier with optimized (gcc -O2) liblzma.Lasse Collin1-0/+4
When optimizing, GCC can reorder code so that an uninitialized value gets used in a comparison, which makes Valgrind unhappy. It doesn't happen when compiled with -O0, which I tend to use when running Valgrind. Thanks to Rich Prohaska. I remember this being mentioned long ago by someone else but nothing was done back then.
2015-03-07liblzma: Silence more uint32_t vs. size_t warnings.Lasse Collin1-1/+1
2015-01-26liblzma: Silence harmless Valgrind errors.Lasse Collin1-0/+6
Thanks to Torsten Rupp for reporting this. I had forgotten to run Valgrind before the 5.2.0 release.
2014-08-04liblzma: Use lzma_memcmplen() in the BT3 match finder.Lasse Collin1-3/+2
I had missed this when writing the commit 5db75054e900fa06ef5ade5f2c21dffdd5d16141. Thanks to Jun I Jin.
2014-07-25liblzma: Use lzma_memcmplen() in the match finders.Lasse Collin2-23/+23
This doesn't change the match finder output.
2014-05-25liblzma: Use lzma_alloc_zero() in LZ encoder initialization.Lasse Collin3-55/+62
This avoids a memzero() call for a newly-allocated memory, which can be expensive when encoding small streams with an over-sized dictionary. To avoid using lzma_alloc_zero() for memory that doesn't need to be zeroed, lzma_mf.son is now allocated separately, which requires handling it separately in normalize() too. Thanks to Vincenzo Innocente for reporting the problem.
2012-07-17liblzma: Make the use of lzma_allocator const-correct.Lasse Collin4-19/+21
There is a tiny risk of causing breakage: If an application assigns lzma_stream.allocator to a non-const pointer, such code won't compile anymore. I don't know why anyone would do such a thing though, so in practice this shouldn't cause trouble. Thanks to Jan Kratochvil for the patch.
2011-06-16liblzma: Remove unneeded semicolon.Lasse Collin1-1/+1
2011-05-17Add underscores to attributes (__attribute((__foo__))).Lasse Collin2-2/+2
2010-09-03liblzma: Adjust default depth calculation for HC3 and HC4.Lasse Collin1-3/+4
It was 8 + nice_len / 4, now it is 4 + nice_len / 4. This allows faster settings at lower nice_len values, even though it seems that I won't use automatic depth calcuation with HC3 and HC4 in the presets.
2010-06-02Silence a bogus Valgrind warning.Lasse Collin1-1/+5
When using -O2 with GCC, it liked to swap two comparisons in one "if" statement. It's otherwise fine except that the latter part, which is seemingly never executed, got executed (nothing wrong with that) and then triggered warning in Valgrind about conditional jump depending on uninitialized variable. A few people find this annoying so do things a bit differently to avoid the warning.
2010-05-26Rename MIN() and MAX() to my_min() and my_max().Lasse Collin5-8/+9
This should avoid some minor portability issues.
2010-02-12Collection of language fixes to comments and docs.Lasse Collin2-2/+2
Thanks to Jonathan Nieder.
2009-11-15Fix wrong indentation caused by incorrect settingsLasse Collin1-9/+9
in the text editor.
2009-11-14Fix a design error in liblzma API.Lasse Collin2-0/+21
Originally the idea was that using LZMA_FULL_FLUSH with Stream encoder would read the filter chain from the same array that was used to intialize the Stream encoder. Since most apps wouldn't use LZMA_FULL_FLUSH, most apps wouldn't need to keep the filter chain available after initializing the Stream encoder. However, due to my mistake, it actually required keeping the array always available. Since setting the new filter chain via the array used at initialization time is not a nice way to do it for a couple of reasons, this commit ditches it and introduces lzma_filters_update(). This new function replaces also the "persistent" flag used by LZMA2 (and to-be-designed Subblock filter), which was also an ugly thing to do. Thanks to Alexey Tourbin for reminding me about the problem that Stream encoder used to require keeping the filter chain allocated.
2009-10-04Use a tuklib module for integer handling.Lasse Collin1-1/+1
This replaces bswap.h and integer.h. The tuklib module uses <byteswap.h> on GNU, <sys/endian.h> on *BSDs and <sys/byteorder.h> on Solaris, which may contain optimized code like inline assembly.
2009-10-02Use unaligned access (if possible) on both endiannessesLasse Collin1-2/+2
in lz_encoder_hash.h.
2009-10-02Make liblzma produce the same output on both endiannesses.Lasse Collin5-14/+98
Seems that it is a problem in some cases if the same version of XZ Utils produces different output on different endiannesses, so this commit fixes that problem. The output will still vary between different XZ Utils versions, but I cannot avoid that for now. This commit bloatens the code on big endian systems by 1 KiB, which should be OK since liblzma is bloated already. ;-)
2009-09-12A few grammar fixes.Lasse Collin1-5/+5
Thanks to Christian Weisgerber for pointing out some of these.
2009-09-11Fix a couple of warnings.Lasse Collin1-4/+1
2009-08-16Fix data corruption in LZ/LZMA2 encoder.Lasse Collin1-1/+1
Thanks to Jonathan Stott for the bug report.
2009-06-30Build system fixesLasse Collin2-29/+21
Don't use libtool convenience libraries to avoid recently discovered long-standing subtle but somewhat severe bugs in libtool (at least 1.5.22 and 2.2.6 are affected). It was found when porting XZ Utils to Windows <http://lists.gnu.org/archive/html/libtool/2009-06/msg00070.html> but the problem is significant also e.g. on GNU/Linux. Unless --disable-shared is passed to configure, static library built from a set of convenience libraries will contain PIC objects. That is, while libtool builds non-PIC objects too, only PIC objects will be used from the convenience libraries. On 32-bit x86 (tested on mobile XP2400+), using PIC instead of non-PIC makes the decompressor 10 % slower with the default CFLAGS. So while xz was linked against static liblzma by default, it got the slower PIC objects unless --disable-shared was used. I tend develop and benchmark with --disable-shared due to faster build time, so I hadn't noticed the problem in benchmarks earlier. This commit also adds support for building Windows resources into liblzma and executables.
2009-06-26Fix @variables@ to $(variables) in Makefile.am files.Lasse Collin1-3/+3
Fix the ordering of libgnu.a and LTLIBINTL on the linker command line and added missing LTLIBINTL to tests/Makefile.am.
2009-04-13Put the interesting parts of XZ Utils into the public domain.Lasse Collin7-80/+31
Some minor documentation cleanups were made at the same time.
2009-04-10Fix off-by-one in LZ decoder.Lasse Collin1-1/+1
Fortunately, this bug had no security risk other than accepting some corrupt files as valid.
2009-02-08Add a separate internal function to initialize the CRC32Lasse Collin1-2/+2
table, which is used also by LZ encoder. This was needed because calling lzma_crc32() and ignoring the result is a no-op due to lzma_attr_pure.
2009-02-02Modify LZMA_API macro so that it works on Windows withLasse Collin1-1/+1
other compilers than MinGW. This may hurt readability of the API headers slightly, but I don't know any better way to do this.
2009-01-27Added initial support for preset dictionary for raw LZMA1Lasse Collin3-13/+49
and LZMA2. It is not supported by the .xz format or the xz command line tool yet.
2008-12-31Remove lzma_init() and other init functions from liblzma API.Lasse Collin1-0/+6
Half of developers were already forgetting to use these functions, which could have caused total breakage in some future liblzma version or even now if --enable-small was used. Now liblzma uses pthread_once() to do the initializations unless it has been built with --disable-threads which make these initializations thread-unsafe. When --enable-small isn't used, liblzma currently gets needlessly linked against libpthread (on systems that have it). While it is stupid for now, liblzma will need threads in future anyway, so this stupidity will be temporary only. When --enable-small is used, different code CRC32 and CRC64 is now used than without --enable-small. This made the resulting binary slightly smaller, but the main reason was to clean it up and to handle the lack of lzma_init_check(). The pkg-config file lzma.pc was renamed to liblzma.pc. I'm not sure if it works correctly and portably for static linking (Libs.private includes -pthread or other operating system specific flags). Hopefully someone complains if it is bad. lzma_rc_prices[] is now included as a precomputed array even with --enable-small. It's just 128 bytes now that it uses uint8_t instead of uint32_t. Smaller array seemed to be at least as fast as the more bloated uint32_t array on x86; hopefully it's not bad on other architectures.
2008-12-15The LZMA2 decoder fix introduced a bug to LZ decoder,Lasse Collin1-10/+23
which made LZ decoder return too early after dictionary reset. This fixes it.
2008-12-15Fix data corruption in LZMA2 decoder.Lasse Collin2-4/+21
2008-11-19Oh well, big messy commit again. Some highlights:Lasse Collin1-2/+2
- Updated to the latest, probably final file format version. - Command line tool reworked to not use threads anymore. Threading will probably go into liblzma anyway. - Memory usage limit is now about 30 % for uncompression and about 90 % for compression. - Progress indicator with --verbose - Simplified --help and full --long-help - Upgraded to the last LGPLv2.1+ getopt_long from gnulib. - Some bug fixes
2008-09-27Some API changes, bug fixes, cleanups etc.Lasse Collin3-44/+42
2008-09-17Miscellaneous LZ and LZMA encoder cleanupsLasse Collin1-2/+6
2008-09-13LZ decoder cleanupLasse Collin1-3/+2
2008-09-13Renamed constants:Lasse Collin2-2/+2
- LZMA_VLI_VALUE_MAX -> LZMA_VLI_MAX - LZMA_VLI_VALUE_UNKNOWN -> LZMA_VLI_UNKNOWN - LZMA_HEADER_ERRRO -> LZMA_OPTIONS_ERROR
2008-09-06CommentsLasse Collin3-9/+8
2008-09-02Some fixes to LZ encoder.Lasse Collin3-75/+94
2008-08-28Sort of garbage collection commit. :-| Many things are stillLasse Collin20-1846/+1865
broken. API has changed a lot and it will still change a little more here and there. The command line tool doesn't have all the required changes to reflect the API changes, so it's easy to get "internal error" or trigger assertions.
2008-06-18Update the code to mostly match the new simpler file formatLasse Collin2-6/+10
specification. Simplify things by removing most of the support for known uncompressed size in most places. There are some miscellaneous changes here and there too. The API of liblzma has got many changes and still some more will be done soon. While most of the code has been updated, some things are not fixed (the command line tool will choke with invalid filter chain, if nothing else). Subblock filter is somewhat broken for now. It will be updated once the encoded format of the Subblock filter has been decided.
2008-06-01Fix a buffer overflow in the LZMA encoder. It was due to myLasse Collin2-124/+7
misunderstanding of the code. There's no tiny fix for this problem, so I also cleaned up the code in general. This reduces the speed of the encoder 2-5 % in the fastest compression mode ("lzma -1"). High compression modes should have no noticeable performance difference. This commit breaks things (especially LZMA_SYNC_FLUSH) but I will fix them once the new format and LZMA2 has been roughly implemented. Plain LZMA won't support LZMA_SYNC_FLUSH at all and won't be supported in the new .lzma format. This may change still but this is what it looks like now. Support for known uncompressed size (that is, LZMA or LZMA2 without EOPM) is likely to go away. This means there will be API changes.
2008-04-25Prevent LZ encoder from hanging with known uncompressedlarhzu/v4.999.3alphaLasse Collin1-2/+7
size. The "fix" breaks LZMA_SYNC_FLUSH at end of stream with known uncompressed size, but since it currently seems likely that support for encoding with known uncompressed size will go away anyway, I'm not fixing this problem now.
2008-04-24Fix wrong return type (uint32_t -> bool).Lasse Collin2-2/+2
2008-04-24Fix data corruption in LZ encoder with LZMA_SYNC_FLUSH.Lasse Collin3-5/+38
2008-03-11Initialize the last byte of the dictionary to zero so thatLasse Collin1-0/+1
lz_get_byte(lz, 0) returns zero. This was broken by 1a3b21859818e4d8e89a1da99699233c1bfd197d.
2008-03-10Always initialize lz->temp_size in lz_decoder.c. temp_size didLasse Collin1-5/+6
get initialized as a side-effect after allocating a new decoder, but not when the decoder was reused.
2008-02-02Don't memzero() the history buffer when initializing LZLasse Collin1-4/+3
decoder. There's no danger of information leak here, so it isn't required. Doing memzero() takes a lot of time with large dictionaries, which could make it easier to construct DoS attack to consume too much CPU time.
2008-01-18Fix LZMA_SYNC_FLUSH handling in LZ and LZMA encoders.Lasse Collin2-8/+27
That code is now almost completely in LZ coder, where it can be shared with other LZ77-based algorithms in future.
2008-01-14Major changes to LZ encoder, LZMA encoder, and range encoder.Lasse Collin2-25/+130
These changes implement support for LZMA_SYNC_FLUSH in LZMA encoder, and move the temporary buffer needed by range encoder from lzma_range_encoder structure to lzma_lz_encoder.
2008-01-14Don't use coder->lz.stream_end_was_reached in assertionsLasse Collin1-2/+0
in match_c.h.
2008-01-10Eliminate lzma_lz_encoder.must_move_pos. It's neededLasse Collin2-8/+2
only in one place which isn't performance criticial.
2007-12-09Imported to git.Lasse Collin18-0/+2193