Age | Commit message (Collapse) | Author | Files | Lines |
|
user_abort_pipe[] was still being used instead of the parameters.
|
|
Old versions of Clang reported the unsupported function attribute and
__crc32d() function as warnings instead of errors, so the feature test
passed when it shouldn't have, causing a compile error at build time.
-Werror was added to this feature test to fix this. The change is not
needed for CMake because check_c_source_compiles() also performs
linking and the error is caught then.
Thanks to Sebastian Andrzej Siewior for reporting this.
|
|
LOCALEDIR may contain spaces like in "C:\Program Files".
|
|
If xz is given a directory, it should look like this:
$ xz /usr/bin
xz: /usr/bin: Is a directory, skipping
The Landlock rules didn't allow opening directories for reading:
$ xz /usr/bin
xz: /usr/bin: Permission denied
The simplest fix was to allow opening directories for reading.
While it's a bit silly to allow it solely for the error message,
it shouldn't make the sandbox significantly weaker.
The single-file use case (like when called from GNU tar) is
still as strict as possible: all Landlock restrictions are
enabled before (de)compression starts.
|
|
Thanks to Sebastian Andrzej Siewior and Sam James for
benchmarking on various systems.
|
|
|
|
|
|
|
|
|
|
It's for binary packages built with windows/build.bash.
|
|
Support for the old MinGW was dropped. Only MinGW-w64 with GCC
is supported now.
The script now supports also cross-compilation from GNU/Linux
(tests are not run). MSYS2 and also the old MSYS 1.0.11 work
for building on Windows. The i686 and x86_64 toolchains must
be in PATH to build both 32-bit and 64-bit versions.
Parallel builds are done if "nproc" from GNU coreutils is available.
MinGW-w64 runtime copyright information file was renamed from
COPYING-Windows.txt to COPYING.MinGW-w64-runtime.txt which
is the filename used by MinGW-w64 itself. Its existence
is now mandatory, it's checked at the beginning of the script.
The file TODO is no longer copied to the package.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The original code was good enough for supporting GNU/Linux
and a few others but it wasn't very portable.
CMake doesn't support Solaris Studio's -xldscope=hidden.
If it ever does, things should still work with this commit
as Solaris Studio supports not only its own __global but also
the GNU C __attribute__((visibility("default"))). Support for the
attribute was added in 2007 to Sun Studio 12 compiler version 5.9.
|
|
|
|
|
|
-O3 doesn't seem useful for speed but it makes the code bigger.
CMake makes is difficult for users to simply override the
optimization level: CFLAGS / CMAKE_C_FLAGS aren't helpful because
they go before CMAKE_C_FLAGS_RELEASE. Of course, users can override
CMAKE_C_FLAGS_RELEASE directly but then they have to remember to
add also -DNDEBUG to disable assertions.
This commit changes -O3 to -O2 in CMAKE_C_FLAGS_RELEASE if and only if
CMAKE_C_FLAGS_RELEASE cache variable doesn't already exist. So if
a custom value is passed on the command line (or reconfiguring an
already-configured build), the cache variable won't be modified.
|
|
This is to match configure.ac.
|
|
|
|
|
|
CMP0154 doesn't affect us since we don't use FILE_SET.
|
|
|
|
|
|
In contrast to Automake, skipping of this test when decoders
are disabled is handled at CMake side instead of test_scripts.sh
because CMake-build doesn't create config.h.
|
|
Compared to the Autotools-based build, this has simpler handling
for the shell (@POSIX_SHELL@) and extra PATH entry for the scripts
(configure has --enable-path-for-scripts=PREFIX). The simpler
metho should be enough for non-ancient systems and Solaris.
|
|
PACKAGE_VERSION was already used in liblzma.pc.in.
This way only one version @foo@ is used.
|
|
It helps that cmake_install.cmake doesn't parallelize installation
so symlinks can be created so that the target is always known to
exist (a requirement on Windows in some cases).
This bumps the minimum CMake version from 3.13 to 3.14 to use
file(CREATE_LINK ...). It could be made to work on 3.13 by
calling "cmake -E create_symlink" but it's uglier code and
slower in "make install". 3.14 should be a reasonable version
to require nowadays, especially since the Autotools build
is still the primary build system for most OSes.
|
|
If gettext tools are available, the .po files listed in po/LINGUAS
are converted using msgfmt. This allows building with translations
directly from xz.git without Autotools.
If gettext tools aren't available, the Autotools-created .gmo files
in the "po" directory will be used. This allows CMake-based build
to use translations from Autotools-generated tarball.
If translation support is found (Intl_FOUND) but both the
gettext tools and the pre-generated .gmo files are missing,
then "make" will fail.
|
|
|
|
|
|
This makes these sandboxing methods stricter when no files are
created or deleted. That is, it's a middle ground between the
initial sandbox and the strictest single-file-to-stdout sandbox:
this allows opening files for reading but output has to go to stdout.
|
|
Linux 6.7 added support for ABI version 4 which restricts
TCP connections which xz won't need and thus those can be
forbidden now. Since the ABI version is handled at runtime,
supporting version 4 won't cause any compatibility issues.
Note that new enough kernel headers are required to get
version 4 support enabled at build time.
|
|
Landlock is now always used just like pledge(2) is: first in more
permissive mode and later (under certain common conditions) in
a strict mode that doesn't allow opening more files.
I put pledge(2) first in sandbox.c because it's the simplest API
to use and still somewhat fine-grained for basic applications.
So it's the simplest thing to understand for anyone reading sandbox.c.
|
|
|
|
Also explicitly initialize progress_automatic to make it clear
that it can be read before message_init() sets it. Static variable
was initialized to false by default already so this is only for
clarity.
|
|
Dirs first, then files in case-sensitive ASCII order.
|
|
|
|
|
|
|
|
All other translated man pages were being installed but
lzmainfo had been forgotten.
|
|
GCC docs promise that it works and a few other compilers do
too. Clang/LLVM is documented source code only but unsurprisingly
it behaves the same as others on x86-64 at least. But the
certainly-portable way is good enough here so use that.
|
|
|
|
The x32 port has a x86-64 ABI in term of all registers but uses only
32bit pointer like x86-32. The assembly optimisation fails to compile on
x32. Given the state of x32 I suggest to exclude it from the
optimisation rather than trying to fix it.
Signed-off-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Don't assume that Clang defines __GNUC__ as the extensions
are available in clang-cl as well (and possibly in some other
Clang variants?).
|
|
|
|
|
|
|
|
Adding the SPDX license identifier changed the line numbers.
|
|
|
|
|
|
|
|
|
|
|
|
It was added in 2017 in c2e29f06a7d1e3ba242ac2fafc69f5d6e92f62cd
but it never got into any release tarballs because it was
forgotten to be added to Makefile.am.
|
|
|
|
|
|
This kind of fixes the problem reported here:
https://bugs.launchpad.net/ubuntu/+source/xz-utils/+bug/1291020
|
|
|
|
|
|
|
|
|
|
It's compatible with GCC and Clang.
|
|
It's used only for basic bittrees and fixed-size reverse bittree
because those showed a clear benefit on x86-64 with GCC and Clang.
The other methods were more mixed and thus are commented out but
they should be tested on other archs.
|
|
|
|
But now it needs one more local variable.
|
|
|
|
|
|
|
|
Now extra buffer space is reserved so that repeating bytes for
any single match will never need to copy from two places (both
the beginning and the end of the buffer). This simplifies
dict_repeat() and helps a little with speed.
This seems to reduce .lzma decompression time about 2 %, so
with .xz and CRC it could be slightly less. The small things
add up still.
|
|
It's not completely obvious if this is better in the decoder.
It should be good if compiler can avoid creating a branch
(like using CMOV on x86).
This also makes lzma_encoder.c use the new macros.
|
|
This adds macros for bittree decoding which prepares the code
for alternative C versions and inline assembly.
|
|
The new decoder resumes the first decoder loop in the Resumable mode.
Then, the code executes in Non-resumable mode until it detects that it
cannot guarantee to have enough input/output to decode another symbol.
The Resumable mode is how the decoder has always worked. Before decoding
every input bit, it checks if there is enough space and will save its
location to be resumed later. When the decoder has more input/output,
it jumps back to the correct sequence in the Resumable mode code.
When the input/output buffers are large, the Resumable mode is much
slower than the Non-resumable because it has more branches and is harder
for the compiler to optimize since it is in a large switch block.
Early benchmarking shows significant time improvement (8-10% on gcc and
clang x86) by using the Non-resumable code as much as possible.
|
|
The new "safe" range decoder mode is the same as old range decoder, but
now the default behavior of the range decoder will not check if there is
enough input or output to complete the operation. When the buffers are
close to fully consumed, the "safe" operations must be used instead. This
will improve speed because it will reduce the number of branches needed
for most of the range decoder operations.
|
|
The footer template from Doxygen has the closing </body> </html>
as Doxygen doesn't add them otherwise.
target="_blank" was omitted as it's not useful here but
it can be slightly annoying as one cannot just go back
in the browser history.
Since the footer links to the license file in the same
directory and not to CC website, the rel attributes
can be omitted.
|
|
|
|
It looked odd to only have the original English authors listed
in the header comments of the translated files.
|
|
lzmainfo has had translation support since 2009 at least but
it was never added to po/POTFILES.in so the messages weren't
translated. It's a very rarely needed tool so it's not too bad.
This also adds src/xz/mytime.c to po/POTFILES.in although there
are no translatable strings. It's simpler this way so that it
won't be forgotten if strings were ever added to that file.
|
|
The same is used for both po/xz.pot and po4a/xz-man.pot.
|
|
All man pages are under 0BSD now so this is simple now.
|
|
|
|
The main reason is a kind of silly one:
xz-man.pot contains strings from all man pages in XZ Utils.
The man pages of xzdiff, xzgrep, and xzmore were under GPLv2
and the rest under 0BSD. Thus xz-man.pot contained strings
under two licences. po4a creates the translated man pages
from the combined 0BSD+GPLv2 xz-man.pot.
I haven't liked this mixing in xz-man.pot but the
Translation Project requires that all man pages must be
in the same .pot file. So a separate xz-man-gpl.pot
wasn't an option.
Since these man pages are short, rewriting them was quick enough.
Now xz-man.pot is entirely under 0BSD and marking the per-file
licenses is simpler.
As a bonus, some wording hopefully is now slightly better
although it's perhaps a matter of taste.
NOTE: In xzgrep.1, the EXIT STATUS section was written by me
in the commit d796b6d7fdb8b7238b277056cf9146cce25db604 so that's
why that section could be taken as is from the old xzgrep.1.
|
|
The xz tool can decompress three file formats and xzless
has always supported uncompressed files too.
|
|
|
|
Also add SPDX license identifier now that there is a known license.
|
|
Perhaps the generated files aren't even copyrightable but
using the same license for them as for the rest of the liblzma
keeps things more consistent for tools that look for license info.
|
|
It is built and run only manually so this didn't matter
unless one wanted to regenerate the price_table.c.
|
|
|
|
|
|
|
|
Translations and doc/xz-file-format.txt and doc/lzma-file-format.txt
were not touched.
COPYING.0BSD was added.
|
|
The initial commit 5d018dc03549c1ee4958364712fb0c94e1bf2741
in 2007 had a comment in sha256.c that the code is based on
Crypto++ Library 5.5.1. In 2009 the Authors list in sha256.c
and the AUTHORS file was updated with information that the
code had come from Crypto++ but via 7-Zip. I know I had viewed
7-Zip's SHA-256 code but back then the C code has been identical
enough with Crypto++, so I don't why I thought the author info
would need that extra step via 7-Zip for this single file.
Another error is that I had mixed sha.* and shacal2.* files
when checking for author info in Crypto++. The shacal2.* files
aren't related to liblzma's sha256.c and thus Kevin Springle's
code in Crypto++ isn't either.
|
|
It was last updated in 2013.
|
|
It was good to keep these around in parallel with the newer examples
but I think it's OK to remove the old ones at this point.
|
|
|
|
If any other BCJ filter was enabled for encoding or decoding, then this
was not a problem.
|
|
|
|
|
|
|
|
|
|
This makes "less" show a warning if a decompression error occurred.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
This was split from the prior commit so it could be easily applied to
the 5.4 branch.
Closes: https://github.com/tukaani-project/xz/pull/77
|
|
If liblzma is configured with --disable-clmul-crc
CFLAGS="-msse4.1 -mpclmul", then it will fail to compile because the
generic version must be used but the CRC tables were not included.
|
|
The code was using HAVE_FUNC_ATTRIBUTE_IFUNC instead of CRC_USE_IFUNC.
With ARM64, ifunc is incompatible because it requires non-inline
function calls for runtime detection.
|
|
|
|
This is similar to the existing x86-64 CLMUL conditions to omit the
tables. They were slightly refactored to improve readability.
|
|
Even though the proper name for the architecture is aarch64, this
project uses ARM64 throughout. So the rename is for consistency.
Additionally, crc32_arm64.h was slightly refactored for the following
changes:
* Added MSVC, FreeBSD, and macOS support in
is_arch_extension_supported().
* crc32_arch_optimized() now checks the size when aligning the
buffer.
* crc32_arch_optimized() loop conditions were slightly modified to
avoid both decrementing the size and incrementing the buffer
pointer.
* Use the intrinsic wrappers defined in <arm_acle.h> because GCC and
Clang name them differently.
* Minor spacing and comment changes.
|
|
The CRC_GENERIC is now split into CRC32_GENERIC and CRC64_GENERIC, since
the ARM64 optimizations will be different between CRC32 and CRC64.
For the same reason, CRC_ARCH_OPTIMIZED is split into
CRC32_ARCH_OPTIMIZED and CRC64_ARCH_OPTIMIZED.
ifunc will only be used with x86-64 CLMUL because the runtime detection
methods needed with ARM64 are not compatible with ifunc.
|
|
|
|
This adds --enable-arm64-crc32/--disable-arm64-crc32 (enabled by
default) for using the ARM64 CRC32 instruction. This can be disabled if
one knows the binary will never need to run on an ARM64 machine
with this instruction extension.
|
|
The CRC32 instructions in ARM64 can calculate the CRC32 result
for 8 bytes in a single operation, making the use of ARM64
instructions much faster compared to the general CRC32 algorithm.
Optimized CRC32 will be enabled if ARM64 has CRC extension
running on Linux.
Signed-off-by: Chenxi Mao <chenxi.mao2013@gmail.com>
|
|
|
|
|
|
|
|
The footer isn't a complete HTML file so having it in the doxygen
directory is a tiny bit clearer.
|
|
The Overall documentation section (1.1) table spacing had to be adjusted
since the filename was very long.
|
|
|
|
|
|
The PROJECT_LOGO field is now used to include the XZ logo. The footer
of each page now lists the copyright information instead of the default
footer. The license is also copied to statisfy the copyright and so the
link in the documentation can be local.
|
|
This hopefully does more good than bad:
+ It's faster by default.
+ Only the threaded compressor creates files that
can be decompressed in threaded mode.
- Compression ratio is worse, usually not too much though.
When it matters, -T1 must be used.
- Memory usage increases.
- Scripts that assume single-threaded mode but don't use -T1 will
possibly use too much resources, for example, if they run
multiple xz processes in parallel to compress multiple files.
- Output from single-threaded and multi-threaded compressors
differ but such changes could happen for other reasons too
(they just haven't happened since 5.0.0).
|
|
|
|
|
|
|
|
|
|
Not all RISC-V processors support fast unaligned access so
it's better to read only one byte in the main loop. This can
be faster even on x86-64 when compared to reading 32 bits at
a time as half the time the address is only 16-bit aligned.
The downside is larger code size on archs that do support
fast unaligned access.
|
|
Version 5.6.0 will be shown, even though upcoming alphas and betas
will be able to support this filter. 5.6.0 looks nicer in the output and
people shouldn't be encouraged to use an unstable version in production
in any way.
|
|
These test files achieve 100% code coverage in
src/liblzma/simple/riscv.c. They contain all of the instructions that
should be filtered and a few cases that should not.
|
|
|
|
A special note was added to suggest using four-byte alignment when the
compressed instruction extension is not present in a RISC-V binary.
|
|
|
|
|
|
|
|
The new Filter ID is 0x0B.
Thanks to Chien Wong <m@xv97.com> for the initial version of the Filter,
the xz CLI updates, and the Autotools build system modifications.
Thanks to Igor Pavlov for his many contributions to the design of
the filter.
|
|
The new RISC-V filter was added to the specification, in addition to
updating the specification URL.
|
|
|
|
|
|
|
|
|
|
|
|
Now crc_simd_body() in crc_x86_clmul.h is only called once
in a translation unit, we no longer need to be so cautious
about ensuring the always-inline behavior.
|
|
|
|
CRC_CLMUL was split to CRC_ARCH_OPTIMIZED and CRC_X86_CLMUL.
CRC_ARCH_OPTIMIZED is defined when an arch-optimized version is used.
Currently the x86 CLMUL implementations are the only arch-optimized
versions, and these also use the CRC_x86_CLMUL macro to tell when
crc_x86_clmul.h needs to be included.
is_clmul_supported() was renamed to is_arch_extension_supported().
crc32_clmul() and crc64_clmul() were renamed to
crc32_arch_optimized() and crc64_arch_optimized().
This way the names make sense with arch-specific non-CLMUL
implementations as well.
|
|
|
|
A CLMUL-only build will have the crcxx_clmul() inlined into
lzma_crcxx(). Previously a jump to the extern lzma_crcxx_clmul()
was needed. Notes about shared liblzma on ELF platforms:
- On platforms that support ifunc and -fvisibility=hidden, this
was silly because CLMUL-only build would have that single extra
jump instruction of extra overhead.
- On platforms that support neither -fvisibility=hidden nor linker
version script (liblzma*.map), jumping to lzma_crcxx_clmul()
would go via PLT so a few more instructions of overhead (still
not a big issue but silly nevertheless).
There was a downside with static liblzma too: if an application only
needs lzma_crc64(), static linking would make the linker include the
CLMUL code for both CRC32 and CRC64 from crc_x86_clmul.o even though
the CRC32 code wouldn't be needed, thus increasing code size of the
executable (assuming that -ffunction-sections isn't used).
Also, now compilers are likely to inline crc_simd_body()
even if they don't support the always_inline attribute
(or MSVC's __forceinline). Quite possibly all compilers
that build the code do support such an attribute. But now
it likely isn't a problem even if the attribute wasn't supported.
Now all x86-specific stuff is in crc_x86_clmul.h. If other archs
The other archs can then have their own headers with their own
is_clmul_supported() and crcxx_clmul().
Another bonus is that the build system doesn't need to care if
crc_clmul.c is needed.
is_clmul_supported() stays as inline function as it's not needed
when doing a CLMUL-only build (avoids a warning about unused function).
|
|
This reduces the number of the complex #if directives.
|
|
|
|
|
|
And remove one too.
|
|
It makes no difference in practice.
|
|
|
|
It requires fast unaligned access to 64-bit integers
and a fast instruction to count leading zeros in
a 64-bit integer (__builtin_ctzll()). This perhaps
should be enabled on some other archs too.
Thanks to Chenxi Mao for the original patch:
https://github.com/tukaani-project/xz/pull/75 (the first commit)
According to the numbers there, this may improve encoding
speed by about 3-5 %.
This enables the 8-byte method on MSVC ARM64 too which
should work but wasn't tested.
|
|
This change hopefully makes no practical difference as Clang
likely was detected via __GNUC__ or _MSC_VER already.
|
|
|
|
This comment is repeated in xzdec.c to help remind us why all the
capabilities are removed from stdin in certain situations.
|
|
xzdec now also uses the sandbox when its configured.
|
|
The sandbox is now enabled for xzdec as well, so it no longer belongs
in just the xz section. xz and xzdec are always built, except for older
MSVC versions, so there isn't a need to conditionally show the sandbox
configuration. CMake will do a little unecessary work on older MSVC
versions that can't build xz or xzdec, but this is a very small
downside.
|
|
If xz is disabled, then xzdec can still use the sandbox.
|
|
A very strict sandbox is used when the last file is decompressed. The
likely most common use case of xzdec is to decompress a single file.
The Pledge sandbox is applied to the entire process with slightly more
relaxed promises, until the last file is processed.
Thanks to Christian Weisgerber for the initial patch adding Pledge
sandboxing.
|
|
This fixes the recent change to lzma_lz_encoder that used memzero
instead of the NULL constant. On some compilers the NULL constant
(always 0) may not equal the NULL pointer (this only needs to guarentee
to not point to valid memory address).
Later code compares the pointers to the NULL pointer so we must
initialize them with the NULL pointer instead of 0 to guarentee
code correctness.
|
|
The first member of lzma_lz_encoder doesn't necessarily need to be set
to NULL since it will always be set before anything tries to use it.
However the function pointer members must be set to NULL since other
functions rely on this NULL value to determine if this behavior is
supported or not.
This fixes a somewhat serious bug, where the options_update() and
set_out_limit() function pointers are not set to NULL. This seems to
have been forgotten since these function pointers were added many years
after the original two (code() and end()).
The problem is that by not setting this to NULL we are relying on the
memory allocation to zero things out if lzma_filters_update() is called
on a LZMA1 encoder. The function pointer for set_out_limit() is less
serious because there is not an API function that could call this in an
incorrect way. set_out_limit() is only called by the MicroLZMA encoder,
which must use LZMA1 where set_out_limit() is always set. Its currently
not possible to call set_out_limit() on an LZMA2 encoder at this time.
So calling lzma_filters_update() on an LZMA1 encoder had undefined
behavior since its possible that memory could be manipulated so the
options_update member pointed to a different instruction sequence.
This is unlikely to be a bug in an existing application since it relies
on calling lzma_filters_update() on an LZMA1 encoder in the first place.
For instance, it does not affect xz because lzma_filters_update() can
only be used when encoding to the .xz format.
This is fixed by using memzero() to set all members of lzma_lz_encoder
to NULL after it is allocated. This ensures this mistake will not occur
here in the future if any additional function pointers are added.
|
|
|
|
lzma_raw_encoder() and lzma_raw_encoder_init() used "options" as the
parameter name instead of "filters" (used by the declaration). "filters"
is more clear since the parameter represents the list of filters passed
to the raw encoder, each of which contains filter options.
|
|
lzma_encoder_init() did not check for NULL options, but
lzma2_encoder_init() did. This is more of a code style improvement than
anything else to help make lzma_encoder_init() and lzma2_encoder_init()
more similar.
|
|
|
|
|
|
Since GCC version 10, GCC no longer complains about simple implicit
integer conversions with Arithmetic operators.
For instance:
uint8_t a = 5;
uint32_t b = a + 5;
Give a warning on GCC 9 and earlier but this:
uint8_t a = 5;
uint32_t b = (a + 5) * 2;
Gives a warning with GCC 10+.
|
|
|
|
Most of these fixes are small typos and tweaks. A few were caused by bad
advice from me. Here is the summary of what is changed:
- Author line edits
- Small comment changes/additions
- Using the return value in the error messages in the fuzz targets'
coder initialization code
- Removed fuzz_encode_stream.options. This set a max length, which may
prevent some worthwhile code paths from being properly exercised.
- Removed the max_len option from fuzz_decode_stream.options for the
same reason as fuzz_encode_stream. The alone decoder fuzz target still
has this restriction.
- Altered the dictionary contents for fuzz_lzma.dict. Instead of keeping
the properties static and varying the dictionary size, the properties
are varied and the dictionary size is kept small. The dictionary size
doesn't have much impact on the code paths but the properties do.
Closes: https://github.com/tukaani-project/xz/pull/73
|
|
This fuzz target handles .xz stream encoding. The first byte of input
is used to dynamically set the preset level in order to increase the
fuzz coverage of complex critical code paths.
|
|
This fuzz target that handles LZMA alone decoding. A new fuzz
dictionary .dict was also created with common LZMA header values to
help speed up the discovery of valid headers.
|
|
All .c files can be built as separate fuzz targets. This simplifies
the Makefile by allowing us to use wildcards instead of having a
Makefile target for each fuzz target.
|