xz.git - XZ Utils

Age	Commit message (Collapse)	Author	Files	Lines
2023-03-11	liblzma: Avoid null pointer + 0 (undefined behavior in C).	Lasse Collin	1	-6/+14
	In the C99 and C17 standards, section 6.5.6 paragraph 8 means that adding 0 to a null pointer is undefined behavior. As of writing, "clang -fsanitize=undefined" (Clang 15) diagnoses this. However, I'm not aware of any compiler that would take advantage of this when optimizing (Clang 15 included). It's good to avoid this anyway since compilers might some day infer that pointer arithmetic implies that the pointer is not NULL. That is, the following foo() would then unconditionally return 0, even for foo(NULL, 0): void bar(char a, char b); int foo(char *a, size_t n) { bar(a, a + n); return a == NULL; } In contrast to C, C++ explicitly allows null pointer + 0. So if the above is compiled as C++ then there is no undefined behavior in the foo(NULL, 0) call. To me it seems that changing the C standard would be the sane thing to do (just add one sentence) as it would ensure that a huge amount of old code won't break in the future. Based on web searches it seems that a large number of codebases (where null pointer + 0 occurs) are being fixed instead to be future-proof in case compilers will some day optimize based on it (like making the above foo(NULL, 0) return 0) which in the worst case will cause security bugs. Some projects don't plan to change it. For example, gnulib and thus many GNU tools currently require that null pointer + 0 is defined: https://lists.gnu.org/archive/html/bug-gnulib/2021-11/msg00000.html https://www.gnu.org/software/gnulib/manual/html_node/Other-portability-assumptions.html In XZ Utils null pointer + 0 issue should be fixed after this commit. This adds a few if-statements and thus branches to avoid null pointer + 0. These check for size > 0 instead of ptr != NULL because this way bugs where size > 0 && ptr == NULL will likely get caught quickly. None of them are in hot spots so it shouldn't matter for performance. A little less readable version would be replacing ptr + offset with offset != 0 ? ptr + offset : ptr or creating a macro for it: #define my_ptr_add(ptr, offset) \ ((offset) != 0 ? ((ptr) + (offset)) : (ptr)) Checking for offset != 0 instead of ptr != NULL allows GCC >= 8.1, Clang >= 7, and Clang-based ICX to optimize it to the very same code as ptr + offset. That is, it won't create a branch. So for hot code this could be a good solution to avoid null pointer + 0. Unfortunately other compilers like ICC 2021 or MSVC 19.33 (VS2022) will create a branch from my_ptr_add(). Thanks to Marcin Kowalczyk for reporting the problem: https://github.com/tukaani-project/xz/issues/36
2022-09-16	liblzma: Vaccinate against an ill patch from RHEL/CentOS 7.	Lasse Collin	1	-0/+14
	RHEL/CentOS 7 shipped with 5.1.2alpha, including the threaded encoder that is behind #ifdef LZMA_UNSTABLE in the API headers. In 5.1.2alpha these symbols are under XZ_5.1.2alpha in liblzma.map. API/ABI compatibility tracking isn't done between development releases so newer releases didn't have XZ_5.1.2alpha anymore. Later RHEL/CentOS 7 updated xz to 5.2.2 but they wanted to keep the exported symbols compatible with 5.1.2alpha. After checking the ABI changes it turned out that >= 5.2.0 ABI is backward compatible with the threaded encoder functions from 5.1.2alpha (but not vice versa as fixes and extensions to these functions were made between 5.1.2alpha and 5.2.0). In RHEL/CentOS 7, XZ Utils 5.2.2 was patched with xz-5.2.2-compat-libs.patch to modify liblzma.map: - XZ_5.1.2alpha was added with lzma_stream_encoder_mt and lzma_stream_encoder_mt_memusage. This matched XZ Utils 5.1.2alpha. - XZ_5.2 was replaced with XZ_5.2.2. It is clear that this was an error; the intention was to keep using XZ_5.2 (XZ_5.2.2 has never been used in XZ Utils). So XZ_5.2.2 lists all symbols that were listed under XZ_5.2 before the patch. lzma_stream_encoder_mt and _mt_memusage are included too so they are listed both here and under XZ_5.1.2alpha. The patch didn't add any __asm__(".symver ...") lines to the .c files. Thus the resulting liblzma.so exports the threaded encoder functions under XZ_5.1.2alpha only. Listing the two functions also under XZ_5.2.2 in liblzma.map has no effect without matching .symver lines. The lack of XZ_5.2 in RHEL/CentOS 7 means that binaries linked against unpatched XZ Utils 5.2.x won't run on RHEL/CentOS 7. This is unfortunate but this alone isn't too bad as the problem is contained within RHEL/CentOS 7 and doesn't affect users of other distributions. It could also be fixed internally in RHEL/CentOS 7. The second problem is more serious: In XZ Utils 5.2.2 the API headers don't have #ifdef LZMA_UNSTABLE for obvious reasons. This is true in RHEL/CentOS 7 version too. Thus now programs using new APIs can be compiled without an extra #define. However, the programs end up depending on symbol version XZ_5.1.2alpha (and possibly also XZ_5.2.2) instead of XZ_5.2 as they would with an unpatched XZ Utils 5.2.2. This means that such binaries won't run on other distributions shipping XZ Utils >= 5.2.0 as they don't provide XZ_5.1.2alpha or XZ_5.2.2; they only provide XZ_5.2 (and XZ_5.0). (This includes RHEL/CentOS 8 as the patch luckily isn't included there anymore with XZ Utils 5.2.4.) Binaries built by RHEL/CentOS 7 users get distributed and then people wonder why they don't run on some other distribution. Seems that people have found out about the patch and been copying it to some build scripts, seemingly curing the symptoms but actually spreading the illness further and outside RHEL/CentOS 7. The ill patch seems to be from late 2016 (RHEL 7.3) and in 2017 it had spread at least to EasyBuild. I heard about the events only recently. :-( This commit splits liblzma.map into two versions: one for GNU/Linux and another for other OSes that can use symbol versioning (FreeBSD, Solaris, maybe others). The Linux-specific file and the matching additions to .c files add full compatibility with binaries that have been built against a RHEL/CentOS-patched liblzma. Builds for OSes other than GNU/Linux won't get the vaccine as they should be immune to the problem (I really hope that no build script uses the RHEL/CentOS 7 patch outside GNU/Linux). The RHEL/CentOS compatibility symbols XZ_5.1.2alpha and XZ_5.2.2 are intentionally put after XZ_5.2 in liblzma_linux.map. This way if one forgets to #define HAVE_SYMBOL_VERSIONS_LINUX when building, the resulting liblzma.so.5 will have lzma_stream_encoder_mt@@XZ_5.2 since XZ_5.2 {...} is the first one that lists that function. Without HAVE_SYMBOL_VERSIONS_LINUX @XZ_5.1.2alpha and @XZ_5.2.2 will be missing but that's still a minor problem compared to only having lzma_stream_encoder_mt@@XZ_5.1.2alpha! The "local: ;" line was moved to XZ_5.0 so that it doesn't need to be moved around. It doesn't matter where it is put. Having two similar liblzma_.map files is a bit silly as it is, at least for now, easily possible to generate the generic one from the Linux-specific file. But that adds extra steps and increases the risk of mistakes when supporting more than one build system. So I rather maintain two files in parallel and let validate_map.sh check that they are in sync when "make mydist" is run. This adds .symver lines for lzma_stream_encoder_mt@XZ_5.2.2 and lzma_stream_encoder_mt_memusage@XZ_5.2.2 even though these weren't exported by RHEL/CentOS 7 (only @@XZ_5.1.2alpha was for these two). I added these anyway because someone might misunderstand the RHEL/CentOS 7 patch and think that @XZ_5.2.2 (@@XZ_5.2.2) versions were exported too. At glance one could suggest using __typeof__ to copy the function prototypes when making aliases. However, this doesn't work trivially because __typeof__ won't copy attributes (lzma_nothrow, lzma_pure) and it won't change symbol visibility from hidden to default (done by LZMA_API()). Attributes could be copied with __copy__ attribute but that needs GCC 9 and a fallback method would be needed anyway. This uses __symver__ attribute with GCC >= 10 and __asm__(".symver ...") with everything else. The attribute method is required for LTO (-flto) support with GCC. Using -flto with GCC older than 10 is now broken on GNU/Linux and will not be fixed (can silently result in a broken liblzma build that has dangerously incorrect symbol versions). LTO builds with Clang seem to work with the traditional __asm__(".symver ...") method. Thanks to Boud Roukema for reporting the problem and discussing the details and testing the fix.
2019-12-31	liblzma: Fix comments.	Lasse Collin	1	-1/+1
	Thanks to Bruce Stark.
2019-07-13	liblzma: Avoid memcpy(NULL, foo, 0) because it is undefined behavior.	Lasse Collin	1	-1/+5
	I should have always known this but I didn't. Here is an example as a reminder to myself: int mycopy(void dest, void src, size_t n) { memcpy(dest, src, n); return dest == NULL; } In the example, a compiler may assume that dest != NULL because passing NULL to memcpy() would be undefined behavior. Testing with GCC 8.2.1, mycopy(NULL, NULL, 0) returns 1 with -O0 and -O1. With -O2 the return value is 0 because the compiler infers that dest cannot be NULL because it was already used with memcpy() and thus the test for NULL gets optimized out. In liblzma, if a null-pointer was passed to memcpy(), there were no checks for NULL after the memcpy() call, so I cautiously suspect that it shouldn't have caused bad behavior in practice, but it's hard to be sure, and the problematic cases had to be fixed anyway. Thanks to Jeffrey Walton.
2017-03-30	liblzma: Fix lzma_memlimit_set(strm, 0).	Lasse Collin	1	-2/+4
	The 0 got treated specially in a buggy way and as a result the function did nothing. The API doc said that 0 was supposed to return LZMA_PROG_ERROR but it didn't. Now 0 is treated as if 1 had been specified. This is done because 0 is already used to indicate an error from lzma_memlimit_get() and lzma_memusage(). In addition, lzma_memlimit_set() no longer checks that the new limit is at least LZMA_MEMUSAGE_BASE. It's counter-productive for the Index decoder and was actually needed only by the auto decoder. Auto decoder has now been modified to check for LZMA_MEMUSAGE_BASE.
2014-05-25	liblzma: Add the internal function lzma_alloc_zero().	Lasse Collin	1	-0/+21

2013-10-02	liblzma: Add LZMA_FULL_BARRIER support to single-threaded encoder.	Lasse Collin	1	-2/+15
	In the single-threaded encoder LZMA_FULL_BARRIER is simply an alias for LZMA_FULL_FLUSH.
2012-12-14	Make the progress indicator smooth in threaded mode.	Lasse Collin	1	-0/+16
	This adds lzma_get_progress() to liblzma and takes advantage of it in xz. lzma_get_progress() collects progress information from the thread-specific structures so that fairly accurate progress information is available to applications. Adding a new function seemed to be a better way than making the information directly available in lzma_stream (like total_in and total_out are) because collecting the information requires locking mutexes. It's waste of time to do it more often than the up to date information is actually needed by an application.
2012-07-17	liblzma: Make the use of lzma_allocator const-correct.	Lasse Collin	1	-5/+5
	There is a tiny risk of causing breakage: If an application assigns lzma_stream.allocator to a non-const pointer, such code won't compile anymore. I don't know why anyone would do such a thing though, so in practice this shouldn't cause trouble. Thanks to Jan Kratochvil for the patch.
2011-05-17	Add underscores to attributes (__attribute((__foo__))).	Lasse Collin	1	-1/+1

2011-04-11	liblzma: Add lzma_stream_encoder_mt() for threaded compression.	Lasse Collin	1	-1/+8
	This is the simplest method to do threading, which splits the uncompressed data into blocks and compresses them independently from each other. There's room for improvement especially to reduce the memory usage, but nevertheless, this is a good start.
2011-04-11	liblzma: Use memzero() to initialize supported_actions[].	Lasse Collin	1	-4/+2
	This is cleaner and makes it simpler to add new members to lzma_action enumeration.
2010-10-23	liblzma: Make lzma_code() check the reserved members in lzma_stream.	Lasse Collin	1	-0/+14
	If any of the reserved members in lzma_stream are non-zero or non-NULL, LZMA_OPTIONS_ERROR is returned. It is possible that a new feature in the future is indicated by just setting a reserved member to some other value, so the old liblzma version need to catch it as an unsupported feature.
2010-05-26	Rename MIN() and MAX() to my_min() and my_max().	Lasse Collin	1	-1/+1
	This should avoid some minor portability issues.
2010-03-06	Fix missing initialization in lzma_strm_init().	Lasse Collin	1	-0/+1
	With bad luck, lzma_code() could return LZMA_BUF_ERROR when it shouldn't. This has been here since the early days of liblzma. It got triggered by the modifications made to the xz tool in commit 18c10c30d2833f394cd7bce0e6a821044b15832f but only when decompressing .lzma files. Somehow I managed to miss testing that with Valgrind earlier. This fixes <http://bugs.gentoo.org/show_bug.cgi?id=305591>. Thanks to Rafał Mużyło for helping to debug it on IRC.
2009-11-14	Fix a design error in liblzma API.	Lasse Collin	1	-1/+19
	Originally the idea was that using LZMA_FULL_FLUSH with Stream encoder would read the filter chain from the same array that was used to intialize the Stream encoder. Since most apps wouldn't use LZMA_FULL_FLUSH, most apps wouldn't need to keep the filter chain available after initializing the Stream encoder. However, due to my mistake, it actually required keeping the array always available. Since setting the new filter chain via the array used at initialization time is not a nice way to do it for a couple of reasons, this commit ditches it and introduces lzma_filters_update(). This new function replaces also the "persistent" flag used by LZMA2 (and to-be-designed Subblock filter), which was also an ugly thing to do. Thanks to Alexey Tourbin for reminding me about the problem that Stream encoder used to require keeping the filter chain allocated.
2009-04-13	Put the interesting parts of XZ Utils into the public domain.	Lasse Collin	1	-10/+3
	Some minor documentation cleanups were made at the same time.
2009-02-13	Changed how the version number is specified in various places.	Lasse Collin	1	-1/+1
	Now configure.ac will get the version number directly from src/liblzma/api/lzma/version.h. The intent is to reduce the number of places where the version number is duplicated. In future, support for displaying Git commit ID may be added too.
2009-02-02	Modify LZMA_API macro so that it works on Windows with	Lasse Collin	1	-8/+8
	other compilers than MinGW. This may hurt readability of the API headers slightly, but I don't know any better way to do this.
2009-01-20	Use LZMA_PROG_ERROR in lzma_code() as documented in base.h.	Lasse Collin	1	-16/+8

2009-01-19	Fix handling of non-fatal errors in lzma_code().	Lasse Collin	1	-1/+8

2008-12-15	Bunch of liblzma API cleanups and fixes.	Lasse Collin	1	-0/+58

2008-09-06	Some API cleanups	Lasse Collin	1	-0/+7

2008-08-28	Sort of garbage collection commit. :-\| Many things are still	Lasse Collin	1	-0/+298
	broken. API has changed a lot and it will still change a little more here and there. The command line tool doesn't have all the required changes to reflect the API changes, so it's easy to get "internal error" or trigger assertions.