xz.git - XZ Utils

Age	Commit message (Collapse)	Author	Files	Lines
2024-02-17	liblzma: Avoid implementation-defined behavior in the RISC-V filter.	Lasse Collin	1	-8/+22
	GCC docs promise that it works and a few other compilers do too. Clang/LLVM is documented source code only but unsurprisingly it behaves the same as others on x86-64 at least. But the certainly-portable way is good enough here so use that.
2024-02-14	Add SPDX license identifier into 0BSD source code files.	Lasse Collin	16	-2/+31

2024-02-14	Change most public domain parts to 0BSD.	Lasse Collin	16	-48/+0
	Translations and doc/xz-file-format.txt and doc/lzma-file-format.txt were not touched. COPYING.0BSD was added.
2024-01-23	liblzma: RISC-V filter: Use byte-by-byte access.	Lasse Collin	1	-30/+84
	Not all RISC-V processors support fast unaligned access so it's better to read only one byte in the main loop. This can be faster even on x86-64 when compared to reading 32 bits at a time as half the time the address is only 16-bit aligned. The downside is larger code size on archs that do support fast unaligned access.
2024-01-23	liblzma: Add RISC-V BCJ filter.	Jia Tan	3	-0/+701
	The new Filter ID is 0x0B. Thanks to Chien Wong <m@xv97.com> for the initial version of the Filter, the xz CLI updates, and the Autotools build system modifications. Thanks to Igor Pavlov for his many contributions to the design of the filter.
2023-02-23	liblzma: Avoid null pointer + 0 (undefined behavior in C).	Lasse Collin	1	-2/+4
	In the C99 and C17 standards, section 6.5.6 paragraph 8 means that adding 0 to a null pointer is undefined behavior. As of writing, "clang -fsanitize=undefined" (Clang 15) diagnoses this. However, I'm not aware of any compiler that would take advantage of this when optimizing (Clang 15 included). It's good to avoid this anyway since compilers might some day infer that pointer arithmetic implies that the pointer is not NULL. That is, the following foo() would then unconditionally return 0, even for foo(NULL, 0): void bar(char a, char b); int foo(char *a, size_t n) { bar(a, a + n); return a == NULL; } In contrast to C, C++ explicitly allows null pointer + 0. So if the above is compiled as C++ then there is no undefined behavior in the foo(NULL, 0) call. To me it seems that changing the C standard would be the sane thing to do (just add one sentence) as it would ensure that a huge amount of old code won't break in the future. Based on web searches it seems that a large number of codebases (where null pointer + 0 occurs) are being fixed instead to be future-proof in case compilers will some day optimize based on it (like making the above foo(NULL, 0) return 0) which in the worst case will cause security bugs. Some projects don't plan to change it. For example, gnulib and thus many GNU tools currently require that null pointer + 0 is defined: https://lists.gnu.org/archive/html/bug-gnulib/2021-11/msg00000.html https://www.gnu.org/software/gnulib/manual/html_node/Other-portability-assumptions.html In XZ Utils null pointer + 0 issue should be fixed after this commit. This adds a few if-statements and thus branches to avoid null pointer + 0. These check for size > 0 instead of ptr != NULL because this way bugs where size > 0 && ptr == NULL will likely get caught quickly. None of them are in hot spots so it shouldn't matter for performance. A little less readable version would be replacing ptr + offset with offset != 0 ? ptr + offset : ptr or creating a macro for it: #define my_ptr_add(ptr, offset) \ ((offset) != 0 ? ((ptr) + (offset)) : (ptr)) Checking for offset != 0 instead of ptr != NULL allows GCC >= 8.1, Clang >= 7, and Clang-based ICX to optimize it to the very same code as ptr + offset. That is, it won't create a branch. So for hot code this could be a good solution to avoid null pointer + 0. Unfortunately other compilers like ICC 2021 or MSVC 19.33 (VS2022) will create a branch from my_ptr_add(). Thanks to Marcin Kowalczyk for reporting the problem: https://github.com/tukaani-project/xz/issues/36
2022-12-16	liblzma: Update authors list in arm64.c.	Lasse Collin	1	-0/+1

2022-12-01	liblzma: Omit zero-skipping from ARM64 filter.	Lasse Collin	1	-58/+23
	It has some complicated downsides and its usefulness is more limited than I originally thought. So this change is bad for certain very specific situations but a generic solution that works for other filters (and is otherwise better too) is planned anyway. And this way 7-Zip can use the same compatible filter for the .7z format. This is still marked as experimental with a new temporary Filter ID.
2022-11-25	liblzma: Omit simple coder init functions if they are disabled.	Lasse Collin	6	-0/+24

2022-11-14	Revert "liblzma: Simple/BCJ filters: Allow disabling generic BCJ options."	Lasse Collin	9	-11/+10
	This reverts commit 177bdc922cb17bd0fd831ab8139dfae912a5c2b8 and also does equivalent change to arm64.c. Now that ARM64 filter will use lzma_options_bcj, this change is not needed anymore.
2022-11-14	Replace the experimental ARM64 filter with a new experimental version.	Lasse Collin	3	-182/+107
	This is incompatible with the previous version. This has space/tab fixes in filter_*.c and bcj.h too.
2022-09-20	liblzma: ARM64: Add comments.	Lasse Collin	1	-0/+13

2022-09-19	liblzma: Add experimental ARM64 BCJ filter with a temporary Filter ID.	Lasse Collin	5	-0/+246
	That is, the Filter ID will be changed once the design is final. The current version will be removed. So files created with the tempoary Filter ID won't be supported in the future.
2022-09-17	liblzma: Simple/BCJ filters: Allow disabling generic BCJ options.	Lasse Collin	8	-9/+10
	This will be needed for the ARM64 BCJ filter as it will use its own options struct.
2019-12-31	Rename unaligned_read32ne to read32ne, and similarly for the others.	Lasse Collin	2	-2/+2

2019-06-23	liblzma: Fix warnings from -Wsign-conversion.	Lasse Collin	5	-13/+14
	Also, more parentheses were added to the literal_subcoder macro in lzma_comon.h (better style but no functional change in the current usage).
2019-05-13	liblzma: Avoid memcpy(NULL, foo, 0) because it is undefined behavior.	Lasse Collin	1	-1/+9
	I should have always known this but I didn't. Here is an example as a reminder to myself: int mycopy(void dest, void src, size_t n) { memcpy(dest, src, n); return dest == NULL; } In the example, a compiler may assume that dest != NULL because passing NULL to memcpy() would be undefined behavior. Testing with GCC 8.2.1, mycopy(NULL, NULL, 0) returns 1 with -O0 and -O1. With -O2 the return value is 0 because the compiler infers that dest cannot be NULL because it was already used with memcpy() and thus the test for NULL gets optimized out. In liblzma, if a null-pointer was passed to memcpy(), there were no checks for NULL after the memcpy() call, so I cautiously suspect that it shouldn't have caused bad behavior in practice, but it's hard to be sure, and the problematic cases had to be fixed anyway. Thanks to Jeffrey Walton.
2016-11-21	liblzma: Avoid multiple definitions of lzma_coder structures.	Lasse Collin	8	-46/+52
	Only one definition was visible in a translation unit. It avoided a few casts and temp variables but seems that this hack doesn't work with link-time optimizations in compilers as it's not C99/C11 compliant. Fixes: http://www.mail-archive.com/xz-devel@tukaani.org/msg00279.html
2012-07-17	liblzma: Make the use of lzma_allocator const-correct.	Lasse Collin	11	-38/+63
	There is a tiny risk of causing breakage: If an application assigns lzma_stream.allocator to a non-const pointer, such code won't compile anymore. I don't know why anyone would do such a thing though, so in practice this shouldn't cause trouble. Thanks to Jan Kratochvil for the patch.
2012-05-28	liblzma: Fix possibility of incorrect LZMA_BUF_ERROR.	Lasse Collin	1	-1/+1
	lzma_code() could incorrectly return LZMA_BUF_ERROR if all of the following was true: - The caller knows how many bytes of output to expect and only provides that much output space. - When the last output bytes are decoded, the caller-provided input buffer ends right before the LZMA2 end of payload marker. So LZMA2 won't provide more output anymore, but it won't know it yet and thus won't return LZMA_STREAM_END yet. - A BCJ filter is in use and it hasn't left any unfiltered bytes in the temp buffer. This can happen with any BCJ filter, but in practice it's more likely with filters other than the x86 BCJ. Another situation where the bug can be triggered happens if the uncompressed size is zero bytes and no output space is provided. In this case the decompression can fail even if the whole input file is given to lzma_code(). A similar bug was fixed in XZ Embedded on 2011-09-19.
2012-04-19	liblzma: Remove outdated comments.	Lasse Collin	2	-5/+1

2011-05-17	Add underscores to attributes (__attribute((__foo__))).	Lasse Collin	6	-6/+6

2010-02-12	Collection of language fixes to comments and docs.	Lasse Collin	1	-1/+1
	Thanks to Jonathan Nieder.
2009-11-14	Fix a design error in liblzma API.	Lasse Collin	1	-0/+12
	Originally the idea was that using LZMA_FULL_FLUSH with Stream encoder would read the filter chain from the same array that was used to intialize the Stream encoder. Since most apps wouldn't use LZMA_FULL_FLUSH, most apps wouldn't need to keep the filter chain available after initializing the Stream encoder. However, due to my mistake, it actually required keeping the array always available. Since setting the new filter chain via the array used at initialization time is not a nice way to do it for a couple of reasons, this commit ditches it and introduces lzma_filters_update(). This new function replaces also the "persistent" flag used by LZMA2 (and to-be-designed Subblock filter), which was also an ugly thing to do. Thanks to Alexey Tourbin for reminding me about the problem that Stream encoder used to require keeping the filter chain allocated.
2009-10-04	Use a tuklib module for integer handling.	Lasse Collin	2	-2/+2
	This replaces bswap.h and integer.h. The tuklib module uses <byteswap.h> on GNU, <sys/endian.h> on *BSDs and <sys/byteorder.h> on Solaris, which may contain optimized code like inline assembly.
2009-07-10	BCJ filters: Reject invalid start offsets with LZMA_OPTIONS_ERROR.	Lasse Collin	8	-8/+12
	This is a quick and slightly dirty fix to make the code conform to the latest file format specification. Without this patch, it's possible to make corrupt files by specifying start offset that is not a multiple of the filter's alignment. Custom start offset is almost never used, so this was only a minor bug. The xz command line tool doesn't validate the start offset, so one will get a bit unclear error message if trying to use an invalid start offset.
2009-06-30	Build system fixes	Lasse Collin	2	-51/+47
	Don't use libtool convenience libraries to avoid recently discovered long-standing subtle but somewhat severe bugs in libtool (at least 1.5.22 and 2.2.6 are affected). It was found when porting XZ Utils to Windows <http://lists.gnu.org/archive/html/libtool/2009-06/msg00070.html> but the problem is significant also e.g. on GNU/Linux. Unless --disable-shared is passed to configure, static library built from a set of convenience libraries will contain PIC objects. That is, while libtool builds non-PIC objects too, only PIC objects will be used from the convenience libraries. On 32-bit x86 (tested on mobile XP2400+), using PIC instead of non-PIC makes the decompressor 10 % slower with the default CFLAGS. So while xz was linked against static liblzma by default, it got the slower PIC objects unless --disable-shared was used. I tend develop and benchmark with --disable-shared due to faster build time, so I hadn't noticed the problem in benchmarks earlier. This commit also adds support for building Windows resources into liblzma and executables.
2009-06-26	Fix @variables@ to $(variables) in Makefile.am files.	Lasse Collin	1	-2/+2
	Fix the ordering of libgnu.a and LTLIBINTL on the linker command line and added missing LTLIBINTL to tests/Makefile.am.
2009-04-15	Fix uint32_t -> size_t in ARM and ARM-Thumb filters.	Lasse Collin	2	-2/+2
	On 64-bit system it would have gone into infinite loop if a single input buffer was over 4 GiB (unlikely).
2009-04-13	Put the interesting parts of XZ Utils into the public domain.	Lasse Collin	14	-153/+54
	Some minor documentation cleanups were made at the same time.
2008-12-31	Renamed lzma_options_simple to lzma_options_bcj in the API.	Lasse Collin	3	-5/+5
	The internal implementation is still using the name "simple". It may need some cleanups, so I look at it later.
2008-09-13	Renamed constants:	Lasse Collin	2	-2/+2
	- LZMA_VLI_VALUE_MAX -> LZMA_VLI_MAX - LZMA_VLI_VALUE_UNKNOWN -> LZMA_VLI_UNKNOWN - LZMA_HEADER_ERRRO -> LZMA_OPTIONS_ERROR
2008-08-28	Sort of garbage collection commit. :-\| Many things are still	Lasse Collin	6	-4/+167
	broken. API has changed a lot and it will still change a little more here and there. The command line tool doesn't have all the required changes to reflect the API changes, so it's easy to get "internal error" or trigger assertions.
2008-06-18	Update the code to mostly match the new simpler file format	Lasse Collin	2	-29/+4
	specification. Simplify things by removing most of the support for known uncompressed size in most places. There are some miscellaneous changes here and there too. The API of liblzma has got many changes and still some more will be done soon. While most of the code has been updated, some things are not fixed (the command line tool will choke with invalid filter chain, if nothing else). Subblock filter is somewhat broken for now. It will be updated once the encoded format of the Subblock filter has been decided.
2008-01-26	Return LZMA_HEADER_ERROR if LZMA_SYNC_FLUSH is used with any	Lasse Collin	1	-0/+8
	of the so called simple filters. If there is demand, limited support for LZMA_SYNC_FLUSH may be added in future. After this commit, using LZMA_SYNC_FLUSH shouldn't cause undefined behavior in any situation.
2008-01-17	Fix wrong too small size of argument unfiltered_max	Lasse Collin	1	-1/+1
	in ia64_coder_init(). It triggered assert() in simple_coder.c, and could have caused a buffer overflow. This error was probably a copypaste mistake, since most of the simple filters use unfiltered_max = 4.
2007-12-11	Remove uncompressed size tracking from the filter encoders.	Lasse Collin	1	-25/+4
	It's not strictly needed there, and just complicates the code. LZ encoder never even had this feature. The primary reason to have uncompressed size tracking in filter encoders was validating that the application doesn't give different amount of input that it had promised. A side effect was to validate internal workings of liblzma. Uncompressed size tracking is still present in the Block encoder. Maybe it should be added to LZMA_Alone and raw encoders too. It's simpler to have one coder just to validate the uncompressed size instead of having it in every filter.
2007-12-09	Imported to git.	Lasse Collin	10	-0/+1109