xz.git - XZ Utils

Age	Commit message (Collapse)	Author	Files	Lines
2024-01-23	xz: Use threaded mode by defaut (as if --threads=0 was used).	Lasse Collin	3	-3/+16
	This hopefully does more good than bad: + It's faster by default. + Only the threaded compressor creates files that can be decompressed in threaded mode. - Compression ratio is worse, usually not too much though. When it matters, -T1 must be used. - Memory usage increases. - Scripts that assume single-threaded mode but don't use -T1 will possibly use too much resources, for example, if they run multiple xz processes in parallel to compress multiple files. - Output from single-threaded and multi-threaded compressors differ but such changes could happen for other reasons too (they just haven't happened since 5.0.0).
2024-01-23	xz: Man page: Add more examples of LZMA2 options with BCJ filters.	Lasse Collin	1	-7/+31

2024-01-23	liblzma: RISC-V filter: Use byte-by-byte access.	Lasse Collin	1	-30/+84
	Not all RISC-V processors support fast unaligned access so it's better to read only one byte in the main loop. This can be faster even on x86-64 when compared to reading 32 bits at a time as half the time the address is only 16-bit aligned. The downside is larger code size on archs that do support fast unaligned access.
2024-01-23	xz: Update xz -lvv for RISC-V filter.	Jia Tan	1	-0/+10
	Version 5.6.0 will be shown, even though upcoming alphas and betas will be able to support this filter. 5.6.0 looks nicer in the output and people shouldn't be encouraged to use an unstable version in production in any way.
2024-01-23	xz: Update message in --long-help for RISC-V Filter.	Jia Tan	1	-0/+1

2024-01-23	xz: Update the man page for the RISC-V Filter.	Jia Tan	1	-1/+2
	A special note was added to suggest using four-byte alignment when the compressed instruction extension is not present in a RISC-V binary.
2024-01-23	liblzma: Update string_conversion.c to support RISC-V Filter.	Jia Tan	1	-0/+5

2024-01-23	liblzma: Add RISC-V BCJ filter.	Jia Tan	8	-0/+740
	The new Filter ID is 0x0B. Thanks to Chien Wong <m@xv97.com> for the initial version of the Filter, the xz CLI updates, and the Autotools build system modifications. Thanks to Igor Pavlov for his many contributions to the design of the filter.
2024-01-19	xz: Update website URLs in the man pages.	Jia Tan	2	-5/+5

2024-01-19	liblzma: Update website URL.	Jia Tan	1	-3/+3

2024-01-11	liblzma: CRC: Add a comment to crc_x86_clmul.h about BUILDING_ macros.	Lasse Collin	1	-0/+6

2024-01-11	liblzma: CRC: Remove crc_always_inline, use lzma_always_inline instead.	Lasse Collin	2	-21/+1
	Now crc_simd_body() in crc_x86_clmul.h is only called once in a translation unit, we no longer need to be so cautious about ensuring the always-inline behavior.
2024-01-11	liblzma: CRC: Update CLMUL comments to more generic wording.	Lasse Collin	2	-13/+13

2024-01-11	liblzma: Rename arch-specific CRC functions and macros.	Lasse Collin	4	-25/+31
	CRC_CLMUL was split to CRC_ARCH_OPTIMIZED and CRC_X86_CLMUL. CRC_ARCH_OPTIMIZED is defined when an arch-optimized version is used. Currently the x86 CLMUL implementations are the only arch-optimized versions, and these also use the CRC_x86_CLMUL macro to tell when crc_x86_clmul.h needs to be included. is_clmul_supported() was renamed to is_arch_extension_supported(). crc32_clmul() and crc64_clmul() were renamed to crc32_arch_optimized() and crc64_arch_optimized(). This way the names make sense with arch-specific non-CLMUL implementations as well.
2024-01-11	liblzma: Fix a comment in crc_common.h.	Lasse Collin	1	-1/+2

2024-01-11	liblzma: Avoid extern lzma_crc32_clmul() and lzma_crc64_clmul().	Lasse Collin	5	-85/+89
	A CLMUL-only build will have the crcxx_clmul() inlined into lzma_crcxx(). Previously a jump to the extern lzma_crcxx_clmul() was needed. Notes about shared liblzma on ELF platforms: - On platforms that support ifunc and -fvisibility=hidden, this was silly because CLMUL-only build would have that single extra jump instruction of extra overhead. - On platforms that support neither -fvisibility=hidden nor linker version script (liblzma*.map), jumping to lzma_crcxx_clmul() would go via PLT so a few more instructions of overhead (still not a big issue but silly nevertheless). There was a downside with static liblzma too: if an application only needs lzma_crc64(), static linking would make the linker include the CLMUL code for both CRC32 and CRC64 from crc_x86_clmul.o even though the CRC32 code wouldn't be needed, thus increasing code size of the executable (assuming that -ffunction-sections isn't used). Also, now compilers are likely to inline crc_simd_body() even if they don't support the always_inline attribute (or MSVC's __forceinline). Quite possibly all compilers that build the code do support such an attribute. But now it likely isn't a problem even if the attribute wasn't supported. Now all x86-specific stuff is in crc_x86_clmul.h. If other archs The other archs can then have their own headers with their own is_clmul_supported() and crcxx_clmul(). Another bonus is that the build system doesn't need to care if crc_clmul.c is needed. is_clmul_supported() stays as inline function as it's not needed when doing a CLMUL-only build (avoids a warning about unused function).
2024-01-11	liblzma: crc_clmul.c: Add crc_attr_target macro.	Lasse Collin	1	-14/+16
	This reduces the number of the complex #if directives.
2024-01-11	liblzma: Simplify existing cases with lzma_attr_no_sanitize_address.	Lasse Collin	1	-9/+3

2024-01-11	liblzma: #define crc_attr_no_sanitize_address in crc_common.h.	Lasse Collin	1	-0/+10

2024-01-10	liblzma: CRC: Add empty lines.	Lasse Collin	3	-1/+5
	And remove one too.
2024-01-10	liblzma: crc_clmul.c: Tidy up the location of MSVC pragma.	Lasse Collin	1	-2/+2
	It makes no difference in practice.
2023-12-28	liblzma: Use 8-byte method in memcmplen.h on ARM64.	Lasse Collin	1	-8/+10
	It requires fast unaligned access to 64-bit integers and a fast instruction to count leading zeros in a 64-bit integer (__builtin_ctzll()). This perhaps should be enabled on some other archs too. Thanks to Chenxi Mao for the original patch: https://github.com/tukaani-project/xz/pull/75 (the first commit) According to the numbers there, this may improve encoding speed by about 3-5 %. This enables the 8-byte method on MSVC ARM64 too which should work but wasn't tested.
2023-12-28	liblzma: Check also for __clang__ in memcmplen.h.	Lasse Collin	1	-1/+2
	This change hopefully makes no practical difference as Clang likely was detected via __GNUC__ or _MSC_VER already.
2023-12-21	xz: Add a comment to Capsicum sandbox setup.	Jia Tan	1	-0/+1
	This comment is repeated in xzdec.c to help remind us why all the capabilities are removed from stdin in certain situations.
2023-12-19	xzdec: Add sandbox support for Pledge, Capsicum, and Landlock.	Jia Tan	1	-7/+139
	A very strict sandbox is used when the last file is decompressed. The likely most common use case of xzdec is to decompress a single file. The Pledge sandbox is applied to the entire process with slightly more relaxed promises, until the last file is processed. Thanks to Christian Weisgerber for the initial patch adding Pledge sandboxing.
2023-12-20	liblzma: Initialize lzma_lz_encoder pointers with NULL.	Jia Tan	1	-1/+5
	This fixes the recent change to lzma_lz_encoder that used memzero instead of the NULL constant. On some compilers the NULL constant (always 0) may not equal the NULL pointer (this only needs to guarentee to not point to valid memory address). Later code compares the pointers to the NULL pointer so we must initialize them with the NULL pointer instead of 0 to guarentee code correctness.
2023-12-16	liblzma: Set all values in lzma_lz_encoder to NULL after allocation.	Jia Tan	1	-3/+1
	The first member of lzma_lz_encoder doesn't necessarily need to be set to NULL since it will always be set before anything tries to use it. However the function pointer members must be set to NULL since other functions rely on this NULL value to determine if this behavior is supported or not. This fixes a somewhat serious bug, where the options_update() and set_out_limit() function pointers are not set to NULL. This seems to have been forgotten since these function pointers were added many years after the original two (code() and end()). The problem is that by not setting this to NULL we are relying on the memory allocation to zero things out if lzma_filters_update() is called on a LZMA1 encoder. The function pointer for set_out_limit() is less serious because there is not an API function that could call this in an incorrect way. set_out_limit() is only called by the MicroLZMA encoder, which must use LZMA1 where set_out_limit() is always set. Its currently not possible to call set_out_limit() on an LZMA2 encoder at this time. So calling lzma_filters_update() on an LZMA1 encoder had undefined behavior since its possible that memory could be manipulated so the options_update member pointed to a different instruction sequence. This is unlikely to be a bug in an existing application since it relies on calling lzma_filters_update() on an LZMA1 encoder in the first place. For instance, it does not affect xz because lzma_filters_update() can only be used when encoding to the .xz format. This is fixed by using memzero() to set all members of lzma_lz_encoder to NULL after it is allocated. This ensures this mistake will not occur here in the future if any additional function pointers are added.
2023-12-16	liblzma: Tweak a comment.	Jia Tan	1	-1/+1

2023-12-16	liblzma: Make parameter names in function definition match declaration.	Jia Tan	1	-4/+4
	lzma_raw_encoder() and lzma_raw_encoder_init() used "options" as the parameter name instead of "filters" (used by the declaration). "filters" is more clear since the parameter represents the list of filters passed to the raw encoder, each of which contains filter options.
2023-12-16	liblzma: Improve lzma encoder init function consistency.	Jia Tan	1	-0/+3
	lzma_encoder_init() did not check for NULL options, but lzma2_encoder_init() did. This is more of a code style improvement than anything else to help make lzma_encoder_init() and lzma2_encoder_init() more similar.
2023-11-30	xz: Fix typo	Kian-Meng Ang	1	-1/+1

2023-11-23	xz: Tweak a comment.	Lasse Collin	1	-2/+2

2023-11-23	xz: Use is_tty() in message.c.	Jia Tan	1	-6/+1

2023-11-23	xz: Create separate is_tty() function.	Jia Tan	2	-7/+37
	The new is_tty() will report if a file descriptor is a terminal or not. On POSIX systems, it is a wrapper around isatty(). However, the native Windows implementation of isatty() will return true for all character devices, not just terminals. So is_tty() has a special case for Windows so it can use alternative Windows API functions to determine if a file descriptor is a terminal. This fixes a bug with MSVC and MinGW-w64 builds that refused to read from or write to non-terminal character devices because xz thought it was a terminal. For instance: xz foo -c > /dev/null would fail because /dev/null was assumed to be a terminal.
2023-11-22	tuklib_integer: Fix typo discovered by codespell.	Jia Tan	1	-1/+1
	Based on internet dictionary searches, 'choise' is an outdated spelling of 'choice'.
2023-11-18	xz: Move the check for --suffix with --format=raw a few lines earlier.	Lasse Collin	1	-22/+22
	Now it reads from argv[] instead of args->arg_names.
2023-11-17	xz: Fix a bug with --files and --files0 in raw mode without a suffix.	Jia Tan	1	-0/+5
	The following command caused a segmentation fault: xz -Fraw --lzma1 --files=foo when foo was a valid file. The usage of --files or --files0 was not being checked when compressing or decompressing in raw mode without a suffix. The suffix checking code was meant to validate that all files to be processed are "-" (if not writing to standard out), meaning the data is only coming from standard in. In this case, there were no file names to check since --files and --files0 store their file name in a different place. Later code assumed the suffix was set and caused a segmentation fault. Now, the above command results in an error.
2023-11-15	xz: Refactor suffix test with raw format.	Jia Tan	1	-25/+13
	The previous version set opt_stdout, but this caused an issue with copying an input file to standard out when decompressing an unknown file type. The following needs to result in an error: echo foo \| xz -df since -c, --stdout is not used. This fixes the previous error by not setting opt_stdout.
2023-11-14	xz: Move suffix check after stdout mode is detected.	Jia Tan	1	-8/+8
	This fixes a bug introduced in cc5aa9ab138beeecaee5a1e81197591893ee9ca0 when the suffix check was initially moved. This caused a situation that previously worked: echo foo \| xz -Fraw --lzma1 \| wc -c to fail because the old code knew that this would write to standard out so a suffix was not needed.
2023-11-14	xz: Detect when all data will be written to standard out earlier.	Jia Tan	1	-0/+21
	If the -c, --stdout argument is not used, then we can still detect when the data will be written to standard out if all of the provided filenames are "-" (denoting standard in) or if no filenames are provided.
2023-11-09	liblzma: Add missing comments to lz_encoder.h.	Jia Tan	1	-1/+5

2023-10-31	liblzma: Fix compilation of fastpos_tablegen.c.	Lasse Collin	1	-0/+2
	The macro lzma_attr_visibility_hidden has to be defined to make fastpos.h usable. The visibility attribute is irrelevant to fastpos_tablegen.c so simply #define the macro to an empty value. fastpos_tablegen.c is never built by the included build systems and so the problem wasn't noticed earlier. It's just a standalone program for generating fastpos_table.c. Fixes: https://github.com/tukaani-project/xz/pull/69 Thanks to GitHub user Jamaika1.
2023-10-30	liblzma: Add a note why crc_always_inline exists for now.	Lasse Collin	1	-0/+5
	Solaris Studio is a possible example (not tested) which supports the always_inline attribute but might not get detected by the common.h #ifdefs.
2023-10-30	liblzma: Use lzma_always_inline in memcmplen.h.	Lasse Collin	1	-2/+1

2023-10-30	liblzma: #define lzma_always_inline in common.h.	Lasse Collin	1	-0/+17

2023-10-30	liblzma: Use lzma_attr_visibility_hidden on private extern declarations.	Lasse Collin	5	-0/+13
	These variables are internal to liblzma and not exposed in the API.
2023-10-30	liblzma: #define lzma_attr_visibility_hidden in common.h.	Lasse Collin	1	-0/+11
	In ELF shared libs: -fvisibility=hidden affects definitions of symbols but not declarations.[] This doesn't affect direct calls to functions inside liblzma as a linker can replace a call to lzma_foo@plt with a call directly to lzma_foo when -fvisibility=hidden is used. [] It has to be like this because otherwise every installed header file would need to explictly set the symbol visibility to default. When accessing extern variables that aren't defined in the same translation unit, compiler assumes that the variable has the default visibility and thus indirection is needed. Unlike function calls, linker cannot optimize this. Using __attribute__((__visibility__("hidden"))) with the extern variable declarations tells the compiler that indirection isn't needed because the definition is in the same shared library. About 15+ years ago, someone told me that it would be good if the CRC tables would be defined in the same translation unit as the C code of the CRC functions. While I understood that it could help a tiny amount, I didn't want to change the code because a separate translation unit for the CRC tables was needed for the x86 assembly code anyway. But when visibility attributes are supported, simply marking the extern declaration with the hidden attribute will get identical result. When there are only a few affected variables, this is trivial to do. I wish I had understood this back then already.
2023-10-26	liblzma: Refer to MinGW-w64 instead of MinGW in the API headers.	Lasse Collin	2	-3/+3
	MinGW (formely a MinGW.org Project, later the MinGW.OSDN Project at <https://osdn.net/projects/mingw/>) has GCC 9.2.0 as the most recent GCC package (released 2021-02-02). The project might still be alive but majority of people have switched to MinGW-w64. Thus it seems clearer to refer to MinGW-w64 in our API headers too. Building with MinGW is likely to still work but I haven't tested it in the recent years.
2023-10-26	liblzma: Add Cflags.private to liblzma.pc.in for MSYS2.	Lasse Collin	1	-0/+1
	It properly adds -DLZMA_API_STATIC when compiling code that will be linked against static liblzma. Having it there on systems other than Windows does no harm. See: https://www.msys2.org/docs/pkgconfig/
2023-10-22	xz: Support basic sandboxing with Linux Landlock (ABI versions 1-3).	Lasse Collin	3	-1/+79
	It is enabled only when decompressing one file to stdout, similar to how Capsicum is used. Landlock was added in Linux 5.13.
2023-10-22	Simplify detection of Capsicum support.	Lasse Collin	3	-11/+7
	This removes support for FreeBSD 10.0 and 10.1 which used <sys/capability.h> instead of <sys/capsicum.h>. Support for FreeBSD 10.1 ended on 2016-12-31. So now FreeBSD >= 10.2 is required to enable Capsicum support. This also removes support for Capsicum on Linux (libcaprights) which seems to have been unmaintained since 2017 and Linux 4.11: https://github.com/google/capsicum-linux
2023-10-22	xz/Windows: Allow clock_gettime with POSIX threads.	Lasse Collin	1	-3/+6
	If winpthreads are used for threading, it's OK to use clock_gettime() from winpthreads too.
2023-10-22	mythread.h: Make MYTHREAD_POSIX compatible with MinGW-w64's winpthreads.	Lasse Collin	1	-1/+22
	This might be almost useless but it doesn't need much extra code either.
2023-10-22	xz/Windows: Ensure that clock_gettime() isn't used with MinGW-w64.	Lasse Collin	1	-2/+7
	This commit alone doesn't change anything in the real-world: - configure.ac currently checks for clock_gettime() only when using pthreads. - CMakeLists.txt doesn't check for clock_gettime() on Windows. So clock_gettime() wasn't used with MinGW-w64 before either. clock_gettime() provides monotonic time and it's better than gettimeofday() in this sense. But clock_gettime() is defined in winpthreads, and liblzma or xz needs nothing else from winpthreads. By avoiding clock_gettime(), we avoid the dependency on libwinpthread-1.dll or the need to link against the static version. As a bonus, GetTickCount64() and MinGW-w64's gettimeofday() can be faster than clock_gettime(CLOCK_MONOTONIC, &tv). The resolution is more than good enough for the progress indicator in xz.
2023-10-22	xz/Windows: Use GetTickCount64() with MinGW-w64 if using Vista threads.	Lasse Collin	1	-3/+11

2023-10-21	liblzma: Move is_clmul_supported() back to crc_common.h.	Jia Tan	4	-50/+51
	This partially reverts creating crc_clmul.c (8c0f9376f58c0696d5d6719705164d35542dd891) where is_clmul_supported() was moved, extern'ed, and renamed to lzma_is_clmul_supported(). This caused a problem when the function call to lzma_is_clmul_supported() results in a call through the PLT. ifunc resolvers run very early in the dynamic loading sequence, so the PLT may not be setup properly at this point. Whether the PLT is used or not for lzma_is_clmul_supported() depened upon the compiler-toolchain used and flags. In liblzma compiled with GCC, for instance, GCC will go through the PLT for function calls internal to liblzma if the version scripts and symbol visibility hiding are not used. If lazy-binding is disabled, then it would have made any program linked with liblzma fail during dynamic loading in the ifunc resolver.
2023-10-19	Build: Remove check for COND_CHECK_CRC32 in check/Makefile.inc.	Jia Tan	1	-2/+2
	Currently crc32 is always enabled, so COND_CHECK_CRC32 must always be set. Because of this, it makes the recent change to conditionally compile check/crc_clmul.c appear wrong since that file has CLMUL implementations for both CRC32 and CRC64.
2023-10-19	liblzma: Fix -fsanitize=address failure with crc_clmul functions.	Jia Tan	1	-0/+6
	After forcing crc_simd_body() to always be inlined it caused -fsanitize=address to fail for lzma_crc32_clmul() and lzma_crc64_clmul(). The __no_sanitize_address__ attribute was added to lzma_crc32_clmul() and lzma_crc64_clmul(), but not removed from crc_simd_body(). ASAN and inline functions behavior has changed over the years for GCC specifically, so while strictly required we will keep __attribute__((__no_sanitize_address__)) on crc_simd_body() in case this becomes a requirement in the future. Older GCC versions refuse to inline a function with ASAN if the caller and callee do not agree on sanitization flags (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89124#c3). If the function was forced to be inlined, it will not compile if the callee function has __no_sanitize_address__ but the caller doesn't.
2023-10-18	tuklib_integer: Revise unaligned reads and writes on strict-align archs.	Lasse Collin	1	-67/+189
	In XZ Utils context this doesn't matter much because unaligned reads and writes aren't used in hot code when TUKLIB_FAST_UNALIGNED_ACCESS isn't #defined.
2023-10-18	tuklib_integer: Add missing write64be and write64le fallback functions.	Lasse Collin	1	-0/+34

2023-10-18	liblzma: Set the MSVC optimization fix to only cover lzma_crc64_clmul().	Jia Tan	1	-15/+15
	After testing a 32-bit Release build on MSVC, only lzma_crc64_clmul() has the bug. crc_simd_body() and lzma_crc32_clmul() do not need the optimizations disabled.
2023-10-18	liblzma: CRC_USE_GENERIC_FOR_SMALL_INPUTS cannot be used with ifunc.	Lasse Collin	1	-1/+3

2023-10-18	liblzma: Include common.h in crc_common.h.	Lasse Collin	2	-1/+3
	crc_common.h depends on common.h. The headers include common.h except when there is a reason to not do so.
2023-10-18	liblzma: Add include guards to crc_common.h.	Jia Tan	1	-0/+5

2023-10-18	liblzma: Add the crc_always_inline macro to crc_simd_body().	Jia Tan	1	-1/+1
	Forcing this to be inline has a significant speed improvement at the cost of a few repeated instructions. The compilers tested on did not inline this function since it is large and is used twice in the same translation unit.
2023-10-18	liblzma: Create crc_always_inline macro.	Jia Tan	1	-0/+15
	This macro must be used instead of the inline keyword. On MSVC, it is a replacement for __forceinline which is an MSVC specific keyword that should not be used with inline (it will issue a warning if it is). It does not use a build system check to determine if __attribute__((__always_inline__)) since all compilers that can use CLMUL extensions (except the special case for MSVC) should support this attribute. If this assumption is incorrect then it will result in a bug report instead of silently producing slow code.
2023-10-18	liblzma: Refactor CRC comments.	Jia Tan	2	-72/+53
	A detailed description of the three dispatch methods was added. Also, duplicated comments now only appear in crc32_fast.c or were removed from both crc32_fast.c and crc64_fast.c if they appeared in crc_clmul.c.
2023-10-18	liblzma: Create crc_clmul.c.	Jia Tan	5	-420/+435
	Both crc32_clmul() and crc64_clmul() are now exported from crc32_clmul.c as lzma_crc32_clmul() and lzma_crc64_clmul(). This ensures that is_clmul_supported() (now lzma_is_clmul_supported()) is not duplicated between crc32_fast.c and crc64_fast.c. Also, it encapsulates the complexity of the CLMUL implementations into a single file and reduces the complexity of crc32_fast.c and crc64_fast.c. Before, CLMUL code was present in crc32_fast.c, crc64_fast.c, and crc_common.h. During the conversion, various cleanups were applied to code (thanks to Lasse Collin) including: - Require using semicolons with MASK_/L/H/LH macros. - Variable typing and const handling improvements. - Improvements to comments. - Fixes to the pragmas used. - Removed unneeded variables. - Whitespace improvements. - Fixed CRC_USE_GENERIC_FOR_SMALL_INPUTS handling. - Silenced warnings and removed the need for some #pragmas
2023-10-18	liblzma: Define CRC_USE_IFUNC in crc_common.h.	Jia Tan	3	-4/+7
	When ifunc is supported, we can define a simpler macro instead of repeating the more complex check in both crc32_fast.c and crc64_fast.c.
2023-10-13	liblzma: Added crc32_clmul to crc32_fast.c.	Hans Jansen	2	-11/+255

2023-10-13	liblzma: Moved CLMUL CRC logic to crc_common.h.	Hans Jansen	2	-247/+240
	crc64_fast.c was updated to use the code from crc_common.h instead.
2023-10-13	liblzma: Rename crc_macros.h to crc_common.h.	Hans Jansen	4	-4/+4

2023-09-26	liblzma: Update a comment.	Lasse Collin	1	-2/+1
	The C standards don't allow an empty translation unit which can be avoided by declaring something, without exporting any symbols. When I committed f644473a211394447824ea00518d0a214ff3f7f2 I had a feeling that some specific toolchain somewhere didn't like empty object files (assembler or maybe "ar" complained) but I cannot find anything to confirm this now. Quite likely I remembered nonsense. I leave this here as a note to my future self. :-)
2023-09-27	liblzma: Avoid compiler warning without creating extra symbol.	Jia Tan	1	-2/+1
	When the generic fast crc64 method is used, then we omit lzma_crc64_table[][]. Similar to d9166b52cf3458a4da3eb92224837ca8fc208d79, we can avoid compiler warnings with -Wempty-translation-unit (Clang) or -pedantic (GCC) by creating a never used typedef instead of an extra symbol.
2023-09-24	Scripts: Change quoting style from `...' to '...'.	Jia Tan	2	-2/+2

2023-09-24	xz: Change quoting style from `...' to '...'.	Jia Tan	7	-18/+18

2023-09-24	liblzma: Change quoting style from `...' to '...'.	Jia Tan	7	-24/+24
	This was done for both internal and API headers.
2023-09-24	tuklib_physmem: Comment out support for Windows versions older than 2000.	Lasse Collin	1	-11/+9

2023-09-24	sysdefs.h: Update the comment about __USE_MINGW_ANSI_STDIO.	Lasse Collin	1	-1/+9

2023-09-22	xz: Windows: Don't (de)compress to special files like "con" or "nul".	Lasse Collin	1	-7/+28
	Before this commit, the following writes "foo" to the console and deletes the input file: echo foo \| xz > con_xz xz --suffix=_xz --decompress con_xz It cannot happen without --suffix because names like con.xz are also special and so attempting to decompress con.xz (or compress con to con.xz) will already fail when opening the input file. Similar thing is possible when compressing. The following writes to "nul" and the input file "n" is deleted. echo foo \| xz > n xz --suffix=ul n Now xz checks if the destination is a special file before continuing. DOS/DJGPP version had a check for this but Windows (and OS/2) didn't.
2023-09-22	MSVC: #define inline and restrict only when needed.	Lasse Collin	1	-5/+8
	This also drops the check for _WIN32 as that shouldn't be needed.
2023-09-22	liblzma: Move a few __attribute__ uses in function declarations.	Lasse Collin	3	-7/+10
	The API headers have many attributes but these were left as is for now.
2023-09-22	xz, xzdec, lzmainfo: Use tuklib_attr_noreturn.	Lasse Collin	7	-25/+37
	For compatibility with C23's [[noreturn]], tuklib_attr_noreturn must be at the beginning of declaration (before "extern" or "static", and even before any GNU C's __attribute__). This commit also moves all other function attributes to the beginning of function declarations. "extern" is kept at the beginning of a line so the attributes are listed on separate lines before "extern" or "static".
2023-09-22	Remove incorrect uses of __attribute__((__malloc__)).	Lasse Collin	3	-6/+6
	xrealloc() is obviously incorrect, modern GCC docs even mention realloc() as an example where this attribute cannot be used. liblzma's lzma_alloc() and lzma_alloc_zero() would be correct uses most of the time but custom allocators may use a memory pool or otherwise hold the pointer so aliasing issues could happen in theory. The xstrdup() case likely was correct but I removed it anyway. Now there are no __malloc__ attributes left in the code. The allocations aren't in hot paths so this should make no practical difference.
2023-09-22	tuklib: Update tuklib_attr_noreturn for C11/C17 and C23.	Lasse Collin	2	-3/+23
	This makes no difference for GCC or Clang as they support GNU C's __attribute__((__noreturn__)) but this helps with MSVC: - VS 2019 version 16.7 and later support _Noreturn if the options /std:c11 or /std:c17 are used. This gets handled with the check for __STDC_VERSION__ >= 201112. - When MSVC isn't in C11/C17 mode, __declspec(noreturn) is used. C23 will deprecate _Noreturn (and <stdnoreturn.h>) for [[noreturn]]. This commit anticipates that but the final __STDC_VERSION__ value isn't known yet.
2023-09-22	MSVC: xz: Make file_io.c and file_io.h compatible with MSVC.	Lasse Collin	2	-0/+36
	Thanks to Kelvin Lee for the original patches and testing the modifications I made.
2023-09-22	MSVC: xz: Use GetTickCount64() to implement mytime_now().	Lasse Collin	1	-2/+9
	It's available since Windows Vista.
2023-09-22	MSVC: xz: Use _stricmp() instead of strcasecmp() in suffix.c.	Kelvin Lee	1	-2/+8

2023-09-22	MSVC: xz: Use _isatty() from <io.h> to implement isatty().	Kelvin Lee	2	-0/+10

2023-09-22	MSVC: xz: Use _fileno() instead of fileno().	Kelvin Lee	1	-0/+4

2023-09-22	MSVC: xzdec: Use _fileno and _setmode.	Kelvin Lee	1	-0/+4

2023-09-22	MSVC: Don't #include <unistd.h>.	Kelvin Lee	2	-2/+8

2023-09-14	liblzma: Mark crc64_clmul() with __attribute__((__no_sanitize_address__)).	Lasse Collin	1	-0/+8
	Thanks to Agostino Sarubbo. Fixes: https://github.com/tukaani-project/xz/issues/62
2023-08-31	xz: Refactor thousand separator detection and disable it on MSVC.	Lasse Collin	1	-44/+45
	Now the two variations of the format strings are created with a macro, and the whole detection code can be easily disabled on platforms where thousand separator formatting is known to not work (MSVC has no support, and on DJGPP 2.05 it can have problems in some cases).
2023-08-31	xz: Fix a too relaxed assertion and remove uses of SSIZE_MAX.	Lasse Collin	2	-5/+4
	SSIZE_MAX isn't readily available on MSVC. Removing it means that there is one thing less to worry when porting to MSVC.
2023-08-28	liblzma: Update assert in vli_ceil4().	Jia Tan	1	-1/+1
	The argument to vli_ceil4() should always guarantee the return value is also a valid lzma_vli. Thus the highest three valid lzma_vli values are invalid arguments. All uses of the function ensure this so the assert is updated to match this.
2023-08-28	liblzma: Add overflow check for Unpadded size in lzma_index_append().	Jia Tan	1	-0/+6
	This was not a security bug since there was no path to overflow UINT64_MAX in lzma_index_append() or when it calls index_file_size(). The bug was discovered by a failing assert() in vli_ceil4() when called from index_file_size() when unpadded_sum (the sum of the compressed size of current Stream and the unpadded_size parameter) exceeds LZMA_VLI_MAX. Previously, the unpadded_size parameter was checked to be not greater than UNPADDED_SIZE_MAX, but no check was done once compressed_base was added. This could not have caused an integer overflow in index_file_size() when called by lzma_index_append(). The calculation for file_size breaks down into the sum of: - Compressed base from all previous Streams - 2 * LZMA_STREAM_HEADER_SIZE (size of the current Streams header and footer) - stream_padding (can be set by lzma_index_stream_padding()) - Compressed base from the current Stream - Unpadded size (parameter to lzma_index_append()) The sum of everything except for Unpadded size must be less than LZMA_VLI_MAX. This is guarenteed by overflow checks in the functions that can set these values including lzma_index_stream_padding(), lzma_index_append(), and lzma_index_cat(). The maximum value for Unpadded size is enforced by lzma_index_append() to be less than or equal UNPADDED_SIZE_MAX. Thus, the sum cannot exceed UINT64_MAX since LZMA_VLI_MAX is half of UINT64_MAX. Thanks to Joona Kannisto for reporting this.
2023-08-08	mythread.h: Fix typo error in Vista threads mythread_once().	Jamaika1	1	-1/+1
	The "once_" variable was accidentally referred to as just "once". This prevented building with Vista threads when HAVE_FUNC_ATTRIBUTE_CONSTRUCTOR was not defined.
2023-08-02	xz: Omit an empty paragraph on the man page.	Lasse Collin	1	-1/+0

2023-08-01	mythread.h: Disable signal functions in builds targeting Wasm + WASI.	ChanTsune	1	-1/+1
	signal.h in WASI SDK doesn't currently provide sigprocmask() or sigset_t. liblzma doesn't need them so this change makes liblzma and xzdec build against WASI SDK. xz doesn't build yet and the tests don't either as tuktest needs setjmp() which isn't (yet?) implemented in WASI SDK. Closes: https://github.com/tukaani-project/xz/pull/57 See also: https://github.com/tukaani-project/xz/pull/56 (The original commit was edited a little by Lasse Collin.)
2023-07-31	Docs: Fix typos found by codespell	Dimitri Papadopoulos Orfanos	12	-21/+21

2023-07-24	liblzma: Prevent an empty translation unit in Windows builds.	Jia Tan	1	-1/+5
	To workaround Automake lacking Windows resource compiler support, an empty source file is compiled to overwrite the resource files for static library builds. Translation units without an external declaration are not allowed by the C standard and result in a warning when used with -Wempty-translation-unit (Clang) or -pedantic (GCC).
2023-07-19	liblzma: Suppress -Wunused-function warning.	Jia Tan	1	-0/+10
	Clang 16.0.0 and earlier have a bug that the ifunc resolver function triggers the -Wunused-function warning. The resolver function is static and only "used" by the __attribute__((__ifunc()__)). At this time, the bug is still unresolved, but has been reported: https://github.com/llvm/llvm-project/issues/63957 This is not a problem in GCC.
2023-07-18	liblzma: Reword lzma_str_list_filters() documentation.	Jia Tan	1	-1/+1
	This further improves the documentation from commit f36ca7982f6bd5e9827219ed4f3c5a1fbf5d7bdf. The previous wording of "supported options" was slightly misleading since the options that are printed are the ones that are relevant for encoding/decoding. It is not about which options can or must be specified.
2023-07-18	liblzma: Improve comment in string_conversion.c.	Jia Tan	1	-2/+2
	The comment used "flag" when referring to decoder options. Just referring to them as options is more clear and consistent.
2023-07-18	xz: Translate the second "%s: " in message.c since French needs "%s : ".	Lasse Collin	1	-1/+1
	This string is used to print a filename when using "xz -v" and stderr isn't a terminal.
2023-07-18	xz: Make "%s: %s" translatable because French needs "%s : %s".	Lasse Collin	4	-14/+18

2023-07-18	liblzma: Tweak #if condition in memcmplen.h.	Lasse Collin	1	-2/+2
	Maybe ICC always #defines _MSC_VER on Windows but now it's very clear which code will get used.
2023-07-18	liblzma: Omit unnecessary parenthesis in a preprocessor directive.	Lasse Collin	1	-2/+2

2023-07-18	xz: Update Authors list in a few files.	Jia Tan	5	-5/+10

2023-07-17	xz: Fix typo in man page.	Jia Tan	1	-1/+1
	The Memory limit information section described three output columns when it actually has six. This was reworded to "multiple" to make it more future proof.
2023-07-17	xz: Minor clean up for coder.c	Jia Tan	1	-32/+21
	* Moved max_block_list_size from a global to local variable. * Reworded error message in validate_block_list_filter(). * Removed helper function filter_chain_error(). * Changed 1 << X to 1U << X in many places
2023-07-17	xz: Update man page Authors and date.	Jia Tan	1	-2/+3

2023-07-17	xz: Add a section to man page for robot mode --filters-help.	Jia Tan	1	-2/+30

2023-07-17	xz: Slight reword in xz man page for consistency.	Jia Tan	1	-1/+1
	Changed will print => prints in xz --robot --version description to match --robot --info-memory description.
2023-07-17	xz: Reorder robot mode subsections in the man page.	Jia Tan	1	-96/+96
	The order is now consistent with the order the command line arguments are documented earlier in the man page. The new order is: 1. --list 2. --info-memory 3. --version Instead of the previous order: 1. --version 2. --info-memory 3. --list
2023-07-17	xz: Update man page for new --filters-help option.	Jia Tan	1	-0/+10

2023-07-17	xz: Add a new --filters-help option.	Jia Tan	3	-0/+43
	The --filters-help can be used to help create filter chains with the --filters and --filtersX options. The message in --long-help is too short to fully explain the syntax to construct complex filter chains. In --robot mode, xz will only print the output from liblzma function lzma_str_list_filters.
2023-07-17	xz: Update the man page for --block-list and --filtersX	Jia Tan	1	-26/+80
	The --block-list option description needed updating since the new --filtersX option changes how it can be used. The new entry for --filters1=FILTERS ... --filter9=FILTERS was created right after the --filters option.
2023-07-17	xz: Update --long-help for the new --filtersX option.	Jia Tan	1	-2/+10

2023-07-17	xz: Ignore filter chains that are set but never used in --block-list.	Jia Tan	1	-18/+48
	If a filter chain is set but not used in --block-list, it introduced unexpected behavior such as requiring an unneeded amount of memory to compress, reducing the number of threads in multi-threaded encoding, and printing an incorrect amount of memory needed to decompress. This also renames filters_init_mask => filters_used_mask. A filter is assumed to be used if it is specified in --filtersX until coder_set_compression_settings() determines which filters are referenced in --block-list.
2023-07-17	xz: Set the Block size for mt encoding correctly.	Jia Tan	1	-1/+67
	When opt_block_size is not used, the Block size for mt encoder is derived from the minimum of the largest Block specified by --block-list and the recommended Block size on all filter chains calculated by lzma_mt_block_size(). This avoids using unnecessary memory and ensures that all Blocks are large enough for the most memory needy filter chain.
2023-07-17	xz: Validate --flush-timeout for all specified filter chains.	Jia Tan	1	-8/+16

2023-07-17	xz: Allows --block-list filters to scale down memory usage.	Jia Tan	1	-55/+214
	Previously, only the default filter chain could have its memory usage adjusted. The filter chains specified with --filtersX were not checked for memory usage. Now, all used filter chains will be adjusted if necessary.
2023-07-17	xz: Do not include block splitting if encoders are disabled.	Jia Tan	1	-9/+20
	The block splitting logic and split_block() function are not needed if encoders are disabled. This will help slightly reduce the binary size when built without encoders and allow split_block() to use functions that require encoders being enabled.
2023-07-17	xz: Free filters[] in debug mode.	Jia Tan	1	-0/+10
	This will only free filter chains created with --filters1-9 since the default filter chain may be set from a static function variable. The complexity to free the default filter chain is not worth the burden on code maintenance.
2023-07-17	xz: Add a message if --block-list is used outside of xz compresssion.	Jia Tan	1	-0/+11
	--block-list is only supported with compression in xz format. This avoids silently ignoring when --block-list is unused.
2023-07-17	xz: Create command line options for filters[1-9].	Jia Tan	3	-60/+230
	The new command line options are meant to be combined with --block-list. They work as an optional extension to --block-list to specify a custom filter chain for each block listed. The new options allow the creation of up to 9 reusable filter chains. For instance: xz --block-list=1:10MiB,3:5MiB,,2:5MiB,1:0 --filters1=delta--lzma2 \ --filters2=x86--lzma2 --filters3=arm64--lzma2 Will create the following blocks: 1. A block of size 10 MiB with filter chain delta, lzma2. 2. A block of size 5 MiB with filter chain arm64, lzma2. 3. A block of size 5 MiB with filter chain arm64, lzma2. 4. A block of size 5 MiB with filter chain x86, lzma2. 5. A block containing the rest of the file contents with filter chain delta, lzma2.
2023-07-17	xz: Use lzma_filters_free() in forget_filter_chain().	Jia Tan	1	-8/+10
	This is a little cleaner than the previous implementation of forget_filter_chain(). It is also more consistent since lzma_str_to_filters() will always terminate the filter chain so there is no need to terminate it later in coder_set_compression_settings().
2023-07-17	xz: Separate string to filter conversion into a helper function.	Jia Tan	1	-13/+20
	Converting from string to filter will also need to be done for block specific filter chains.
2023-07-17	xz: Update --long-help and man page for new --filters option.	Jia Tan	2	-5/+42

2023-07-17	xz: Add --filters option to CLI.	Jia Tan	3	-4/+58
	The --filters option uses the new lzma_str_to_filters() function to convert a string into a full filter chain. Using this option will reset all previous filters set by --preset, --[filter], or --filters.
2023-07-08	liblzma: Remove non-portable empty initializer.	Jia Tan	1	-1/+1
	Commit 78704f36e74205857c898a351c757719a6c8b666 added an empty initializer {} to prevent a warning. The empty initializer is a GNU extension and results in a build failure on MSVC. The -wpedantic flag warns about empty initializers.
2023-06-29	liblzma: Prevent uninitialzed warning in mt stream encoder.	Jia Tan	1	-1/+1
	This change only impacts the compiler warning since it was impossible for the wait_abs struct in stream_encode_mt() to be used before it was initialized since mythread_condtime_set() will always be called before mythread_cond_timedwait(). Since the mythread.h code is different between the POSIX and Windows versions, this warning was only present on Windows builds. Thanks to Arthur S for reporting the warning and providing an initial patch.
2023-06-28	liblzma: Prevent warning for MSYS2 Windows build.	Jia Tan	1	-2/+4
	In lzma_memcmplen(), the <intrin.h> header file is only included if _MSC_VER and _M_X64 are both defined but _BitScanForward64() was previously used if _M_X64 was defined. GCC for MSYS2 defines _M_X64 but not _MSC_VER so _BitScanForward64() was used without including <intrin.h>. Now, lzma_memcmplen() will use __builtin_ctzll() for MSYS2 GCC builds as expected.
2023-06-27	liblzma: Add ifunc implementation to crc64_fast.c.	Lasse Collin	1	-9/+26
	The ifunc method avoids indirection via the function pointer crc64_func. This works on GNU/Linux and probably on FreeBSD too. The previous __attribute((__constructor__)) method is kept for compatibility with ELF platforms which do support ifunc. The ifunc method has some limitations, for example, building liblzma with -fsanitize=address will result in segfaults. The configure option --disable-ifunc must be used for such builds. Thanks to Hans Jansen for the original patch. Closes: https://github.com/tukaani-project/xz/pull/53
2023-05-13	liblzma: Slightly rewords lzma_str_list_filters() documentation.	Jia Tan	1	-1/+1
	Reword "options required" to "supported options". The previous may have suggested that the options listed were all required anytime a filter is used for encoding or decoding. The reword makes this more clear that adjusting the options is optional.
2023-05-12	liblzma: Adds lzma_nothrow to MicroLZMA API functions.	Jia Tan	1	-2/+3
	None of the liblzma functions may throw an exception, so this attribute should be applied to all liblzma API functions.
2023-05-11	liblzma: Exports lzma_mt_block_size() as an API function.	Jia Tan	7	-22/+61
	The lzma_mt_block_size() was previously just an internal function for the multithreaded .xz encoder. It is used to provide a recommended Block size for a given filter chain. This function is helpful to determine the maximum Block size for the multithreaded .xz encoder when one wants to change the filters between blocks. Then, this determined Block size can be provided to lzma_stream_encoder_mt() in the lzma_mt options parameter when intializing the coder. This requires one to know all the filter chains they are using before starting to encode (or at least the filter chain that will need the largest Block size), but that isn't a bad limitation.
2023-05-11	liblzma: Creates IS_ENC_DICT_SIZE_VALID() macro.	Jia Tan	2	-3/+9
	This creates an internal liblzma macro to test if the dictionary size is valid for encoding.
2023-05-04	tuklib_integer.h: Reverts previous commit.	Jia Tan	1	-2/+2
	Previous commit 6be460dde07113fe3f08f814b61ddc3264125a96 would cause an error if the integer size was 32 bit.
2023-05-04	tuklib_integer.h: Changes two other UINT_MAX == UINT32_MAX to >=.	Jia Tan	1	-2/+2

2023-05-03	tuklib_integer.h: Fix a recent copypaste error in Clang detection.	Lasse Collin	1	-2/+2
	Wrong line was changed in 7062348bf35c1e4cbfee00ad9fffb4a21aa6eff7. Also, this has >= instead of == since ints larger than 32 bits would work too even if not relevant in practice.
2023-04-19	Windows: Include <intrin.h> when needed.	Jia Tan	2	-0/+16
	Legacy Windows did not need to #include <intrin.h> to use the MSVC intrinsics. Newer versions likely just issue a warning, but the MSVC documentation says to include the header file for the intrinsics we use. GCC and Clang can "pretend" to be MSVC on Windows, so extra checks are needed in tuklib_integer.h to only include <intrin.h> when it will is actually needed.
2023-04-19	tuklib_integer: Use __builtin_clz() with Clang.	Jia Tan	1	-3/+3
	Clang has support for __builtin_clz(), but previously Clang would fallback to either the MSVC intrinsic or the regular C code. This was discovered due to a bug where a new version of Clang required the <intrin.h> header file in order to use the MSVC intrinsics. Thanks to Anton Kochkov for notifying us about the bug.
2023-04-14	liblzma: Update project maintainers in lzma.h.	Lasse Collin	1	-1/+1
	AUTHORS was updated earlier, lzma.h was simply forgotten.
2023-04-13	liblzma: Cleans up old commented out code.	Jia Tan	1	-11/+0

2023-03-23	Build: Removes redundant check for LZMA1 filter support.	Jia Tan	1	-4/+1

2023-03-19	liblzma: Silence -Wsign-conversion in SSE2 code in memcmplen.h.	Lasse Collin	1	-1/+2
	Thanks to Christian Hesse for reporting the issue. Fixes: https://github.com/tukaani-project/xz/issues/44
2023-03-18	Change a few HTTP URLs to HTTPS.	Lasse Collin	3	-6/+6
	The xz man page timestamp was intentionally left unchanged.
2023-03-17	liblzma: Remove note from lzma_options_bcj about the ARM64 exception.	Jia Tan	1	-1/+1
	This was left in by mistake since an early version of the ARM64 filter used a different struct for its options.
2023-03-17	liblzma: Add set lzma.h as the main page for Doxygen documentation.	Jia Tan	15	-29/+2
	The \mainpage command is used in the first block of comments in lzma.h. This changes the previously nearly empty index.html to use the first comment block in lzma.h for its contents. lzma.h is no longer documented separately, but this is for the better since lzma.h only defined a few macros that users do not need to use. The individual API header files all have a disclaimer that they should not be #included directly, so there should be no confusion on the fact that lzma.h should be the only header used by applications. Additionally, the note "See ../lzma.h for information about liblzma as a whole." was removed since lzma.h is now the main page of the generated HTML and does not have its own page anymore. So it would be confusing in the HTML version and was only a "nice to have" when browsing the source files.
2023-03-13	liblzma: Defines masks for return values from lzma_index_checks().	Jia Tan	1	-0/+23

2023-03-11	xz: Simplify the error-label in Capsicum sandbox code.	Lasse Collin	1	-15/+12
	Also remove unneeded "sandbox_allowed = false;" as this code will never be run more than once (making it work with multiple input files isn't trivial).
2023-03-08	xz: Make Capsicum sandbox more strict with stdin and stdout.	Lasse Collin	1	-0/+8

2023-03-08	Revert: "Add warning if Capsicum sandbox system calls are unsupported."	Jia Tan	1	-6/+4
	The warning causes the exit status to be 2, so this will cause problems for many scripted use cases for xz. The sandbox usage is already very limited already, so silently disabling this allows it to be more usable.
2023-03-07	xz: Fix -Wunused-label in io_sandbox_enter().	Jia Tan	1	-2/+2
	Thanks to Xin Li for recommending the fix.
2023-03-06	xz: Add warning if Capsicum sandbox system calls are unsupported.	Jia Tan	1	-0/+2
	The warning is only used when errno == ENOSYS. Otherwise, xz still issues a fatal error.
2023-03-06	xz: Skip Capsicum sandbox system calls when they are unsupported.	Jia Tan	1	-5/+17
	If a system has the Capsicum header files but does not actually implement the system calls, then this would render xz unusable. Instead, we can check if errno == ENOSYS and not issue a fatal error.
2023-03-06	xz: Reorder cap_enter() to beginning of capsicum sandbox code.	Jia Tan	1	-3/+3
	cap_enter() puts the process into the sandbox. If later calls to cap_rights_limit() fail, then the process can still have some extra protections.
2023-03-01	liblzma: Clarify lzma_lzma_preset() documentation in lzma12.h.	Jia Tan	1	-0/+5
	lzma_lzma_preset() does not guarentee that the lzma_options_lzma are usable in an encoder even if it returns false (success). If liblzma is built with default configurations, then the options will always be usable. However if the match finders hc3, hc4, or bt4 are disabled, then the options may not be usable depending on the preset level requested. The documentation was updated to reflect this complexity, since this behavior was unclear before.
2023-02-24	liblzma: Replace '\n' -> newline in filter.h documentation.	Jia Tan	1	-1/+1
	The '\n' renders as a newline when the comments are converted to html by Doxygen.
2023-02-24	liblzma: Shorten return description for two functions in filter.h.	Jia Tan	1	-6/+2
	Shorten the description for lzma_raw_encoder_memusage() and lzma_raw_decoder_memusage().
2023-02-24	liblzma: Reword a few lines in filter.h	Jia Tan	1	-5/+5

2023-02-24	liblzma: Improve documentation in filter.h.	Jia Tan	1	-83/+143
	All functions now explicitly specify parameter and return values. The notes and code annotations were moved before the parameter and return value descriptions for consistency. Also, the description above lzma_filter_encoder_is_supported() about not being able to list available filters was removed since lzma_str_list_filters() will do this.
2023-02-23	liblzma: Avoid null pointer + 0 (undefined behavior in C).	Lasse Collin	10	-23/+77
	In the C99 and C17 standards, section 6.5.6 paragraph 8 means that adding 0 to a null pointer is undefined behavior. As of writing, "clang -fsanitize=undefined" (Clang 15) diagnoses this. However, I'm not aware of any compiler that would take advantage of this when optimizing (Clang 15 included). It's good to avoid this anyway since compilers might some day infer that pointer arithmetic implies that the pointer is not NULL. That is, the following foo() would then unconditionally return 0, even for foo(NULL, 0): void bar(char a, char b); int foo(char *a, size_t n) { bar(a, a + n); return a == NULL; } In contrast to C, C++ explicitly allows null pointer + 0. So if the above is compiled as C++ then there is no undefined behavior in the foo(NULL, 0) call. To me it seems that changing the C standard would be the sane thing to do (just add one sentence) as it would ensure that a huge amount of old code won't break in the future. Based on web searches it seems that a large number of codebases (where null pointer + 0 occurs) are being fixed instead to be future-proof in case compilers will some day optimize based on it (like making the above foo(NULL, 0) return 0) which in the worst case will cause security bugs. Some projects don't plan to change it. For example, gnulib and thus many GNU tools currently require that null pointer + 0 is defined: https://lists.gnu.org/archive/html/bug-gnulib/2021-11/msg00000.html https://www.gnu.org/software/gnulib/manual/html_node/Other-portability-assumptions.html In XZ Utils null pointer + 0 issue should be fixed after this commit. This adds a few if-statements and thus branches to avoid null pointer + 0. These check for size > 0 instead of ptr != NULL because this way bugs where size > 0 && ptr == NULL will likely get caught quickly. None of them are in hot spots so it shouldn't matter for performance. A little less readable version would be replacing ptr + offset with offset != 0 ? ptr + offset : ptr or creating a macro for it: #define my_ptr_add(ptr, offset) \ ((offset) != 0 ? ((ptr) + (offset)) : (ptr)) Checking for offset != 0 instead of ptr != NULL allows GCC >= 8.1, Clang >= 7, and Clang-based ICX to optimize it to the very same code as ptr + offset. That is, it won't create a branch. So for hot code this could be a good solution to avoid null pointer + 0. Unfortunately other compilers like ICC 2021 or MSVC 19.33 (VS2022) will create a branch from my_ptr_add(). Thanks to Marcin Kowalczyk for reporting the problem: https://github.com/tukaani-project/xz/issues/36
2023-02-23	liblzma: Adjust container.h for consistency with filter.h.	Jia Tan	1	-11/+9

2023-02-23	liblzma: Fix small typos and reword a few things in filter.h.	Jia Tan	1	-7/+6

2023-02-23	liblzma: Convert list of flags in lzma_mt to bulleted list.	Jia Tan	1	-3/+6

2023-02-23	liblzma: Fix typo in documentation in container.h	Jia Tan	1	-1/+1
	lzma_microlzma_decoder -> lzma_microlzma_encoder
2023-02-23	liblzma: Improve documentation for container.h	Jia Tan	1	-53/+93
	Standardizing each function to always specify parameters and return values. Also moved the parameters and return values to the end of each function description.
2023-02-16	liblzma: Very minor API doc tweaks.	Lasse Collin	4	-14/+14
	Use "member" to refer to struct members as that's the term used by the C standard. Use lzma_options_delta.dist and such in docs so that in Doxygen's HTML output they will link to the doc of the struct member. Clean up a few trailing white spaces too.
2023-02-17	liblzma: Adjust spacing in doc headers in bcj.h.	Jia Tan	1	-7/+7

2023-02-17	liblzma: Adjust documentation in bcj.h for consistent style.	Jia Tan	1	-21/+22

2023-02-17	liblzma: Rename field => member in documentation.	Jia Tan	7	-95/+95
	Also adjusted preset value => preset level.
2023-02-16	liblzma: Silence a warning from MSVC.	Lasse Collin	1	-1/+1
	It gives C4146 here since unary minus with unsigned integer is still unsigned (which is the intention here). Doing it with substraction makes it clearer and avoids the warning. Thanks to Nathan Moinvaziri for reporting this.
2023-02-16	liblzma: Improve documentation for stream_flags.h	Jia Tan	1	-30/+46
	Standardizing each function to always specify parameters and return values. Also moved the parameters and return values to the end of each function description. A few small things were reworded and long sentences broken up.
2023-02-15	liblzma: Improve documentation in lzma12.h.	Jia Tan	1	-9/+23
	All functions now explicitly specify parameter and return values.
2023-02-15	liblzma: Improve documentation in check.h.	Jia Tan	1	-13/+28
	All functions now explicitly specify parameter and return values. Also moved the note about SHA-256 functions not being exported to the top of the file.
2023-02-15	liblzma: Improve documentation in index.h	Jia Tan	1	-51/+126
	All functions now explicitly specify parameter and return values.
2023-02-15	liblzma: Reword a comment in index.h.	Jia Tan	1	-2/+2

2023-02-15	liblzma: Omit lzma_index_iter's internal field from Doxygen docs.	Jia Tan	1	-1/+8
	Add \private above this field and its sub-fields since it is not meant to be modified by users.
2023-02-14	liblzma: Fix documentation for LZMA_MEMLIMIT_ERROR.	Jia Tan	1	-1/+1
	LZMA_MEMLIMIT_ERROR was missing the "<" character needed to put documentation after a member.
2023-02-14	liblzma: Improve documentation for base.h.	Jia Tan	1	-5/+25
	Standardizing each function to always specify params and return values. Also fixed a small grammar mistake.
2023-02-14	liblzma: Add one more missing [out] annotation in vli.h	Jia Tan	1	-1/+1

2023-02-14	liblzma: Minor improvements to vli.h.	Jia Tan	1	-6/+7
	Added [out] annotations to parameters that are pointers and can have their value changed. Also added a clarification to lzma_vli_is_valid.
2023-02-10	liblzma: Add comments for macros in delta.h.	Jia Tan	1	-0/+8
	Document LZMA_DELTA_DIST_MIN and LZMA_DELTA_DIST_MAX for completeness and to avoid Doxygen warnings.
2023-02-10	liblzma: Improve documentation in index_hash.h.	Jia Tan	1	-9/+27
	All functions now explicitly specify parameter and return values. Also reworded the description of lzma_index_hash_init() for readability.
2023-02-07	xz: Improve the comment about start_time in mytime.c.	Lasse Collin	1	-5/+10
	start_time is relative to an arbitary point in time, it's not time of day, so using it for anything else than time differences wouldn't make sense.
2023-02-04	xz: Add a comment clarifying the use of start_time in mytime.c.	Jia Tan	1	-0/+5

2023-02-04	liblzma: Improve documentation for version.h.	Jia Tan	1	-7/+22
	Specified parameter and return values for API functions and documented a few more of the macros.
2023-02-03	liblzma: Fix bug in lzma_str_from_filters() not checking filters[] length.	Jia Tan	1	-0/+7
	The bug is only a problem in applications that do not properly terminate the filters[] array with LZMA_VLI_UNKNOWN or have more than LZMA_FILTERS_MAX filters. This bug does not affect xz.
2023-02-03	liblzma: Fix typos in comments in string_conversion.c.	Jia Tan	1	-2/+2

2023-02-03	liblzma: Clarify block encoder and decoder documentation.	Jia Tan	1	-4/+11
	Added a few sentences to the description for lzma_block_encoder() and lzma_block_decoder() to highlight that the Block Header must be coded before calling these functions.
2023-02-03	Update lzma_block documentation for lzma_block_uncomp_encode().	Jia Tan	1	-0/+3

2023-02-03	liblzma: Minor edits to lzma_block header_size documentation.	Jia Tan	1	-1/+2

2023-02-03	liblzma: Enumerate functions that read version in lzma_block.	Jia Tan	1	-2/+11

2023-02-03	liblzma: Clarify comment in block.h.	Jia Tan	1	-1/+2

2023-02-03	liblzma: Improve documentation for block.h.	Jia Tan	1	-21/+75
	Standardizing each function to always specify params and return values. Output pointer parameters are also marked with doxygen style [out] to make it clear. Any note sections were also moved above the parameter and return sections for consistency.
2023-02-01	liblzma: Clarify a comment about LZMA_STR_NO_VALIDATION.	Jia Tan	1	-2/+3
	The flag description for LZMA_STR_NO_VALIDATION was previously confusing about the treatment for filters than cannot be used with .xz format (lzma1) without using LZMA_STR_ALL_FILTERS. Now, it is clear that LZMA_STR_NO_VALIDATION is not a super set of LZMA_STR_ALL_FILTERS.