aboutsummaryrefslogtreecommitdiff
path: root/src/liblzma/check/crc64_fast.c (follow)
AgeCommit message (Collapse)AuthorFilesLines
2024-02-01liblzma: Only use ifunc in crcXX_fast.c if its needed.Jia Tan1-3/+3
The code was using HAVE_FUNC_ATTRIBUTE_IFUNC instead of CRC_USE_IFUNC. With ARM64, ifunc is incompatible because it requires non-inline function calls for runtime detection.
2024-02-01liblzma: Rename crc32_aarch64.h to crc32_arm64.h.Jia Tan1-3/+0
Even though the proper name for the architecture is aarch64, this project uses ARM64 throughout. So the rename is for consistency. Additionally, crc32_arm64.h was slightly refactored for the following changes: * Added MSVC, FreeBSD, and macOS support in is_arch_extension_supported(). * crc32_arch_optimized() now checks the size when aligning the buffer. * crc32_arch_optimized() loop conditions were slightly modified to avoid both decrementing the size and incrementing the buffer pointer. * Use the intrinsic wrappers defined in <arm_acle.h> because GCC and Clang name them differently. * Minor spacing and comment changes.
2024-02-01liblzma: Refactor crc_common.h.Jia Tan1-4/+4
The CRC_GENERIC is now split into CRC32_GENERIC and CRC64_GENERIC, since the ARM64 optimizations will be different between CRC32 and CRC64. For the same reason, CRC_ARCH_OPTIMIZED is split into CRC32_ARCH_OPTIMIZED and CRC64_ARCH_OPTIMIZED. ifunc will only be used with x86-64 CLMUL because the runtime detection methods needed with ARM64 are not compatible with ifunc.
2024-01-27Speed up CRC32 calculation on ARM64Chenxi Mao1-1/+4
The CRC32 instructions in ARM64 can calculate the CRC32 result for 8 bytes in a single operation, making the use of ARM64 instructions much faster compared to the general CRC32 algorithm. Optimized CRC32 will be enabled if ARM64 has CRC extension running on Linux. Signed-off-by: Chenxi Mao <chenxi.mao2013@gmail.com>
2024-01-11liblzma: CRC: Update CLMUL comments to more generic wording.Lasse Collin1-5/+5
2024-01-11liblzma: Rename arch-specific CRC functions and macros.Lasse Collin1-6/+7
CRC_CLMUL was split to CRC_ARCH_OPTIMIZED and CRC_X86_CLMUL. CRC_ARCH_OPTIMIZED is defined when an arch-optimized version is used. Currently the x86 CLMUL implementations are the only arch-optimized versions, and these also use the CRC_x86_CLMUL macro to tell when crc_x86_clmul.h needs to be included. is_clmul_supported() was renamed to is_arch_extension_supported(). crc32_clmul() and crc64_clmul() were renamed to crc32_arch_optimized() and crc64_arch_optimized(). This way the names make sense with arch-specific non-CLMUL implementations as well.
2024-01-11liblzma: Avoid extern lzma_crc32_clmul() and lzma_crc64_clmul().Lasse Collin1-2/+7
A CLMUL-only build will have the crcxx_clmul() inlined into lzma_crcxx(). Previously a jump to the extern lzma_crcxx_clmul() was needed. Notes about shared liblzma on ELF platforms: - On platforms that support ifunc and -fvisibility=hidden, this was silly because CLMUL-only build would have that single extra jump instruction of extra overhead. - On platforms that support neither -fvisibility=hidden nor linker version script (liblzma*.map), jumping to lzma_crcxx_clmul() would go via PLT so a few more instructions of overhead (still not a big issue but silly nevertheless). There was a downside with static liblzma too: if an application only needs lzma_crc64(), static linking would make the linker include the CLMUL code for both CRC32 and CRC64 from crc_x86_clmul.o even though the CRC32 code wouldn't be needed, thus increasing code size of the executable (assuming that -ffunction-sections isn't used). Also, now compilers are likely to inline crc_simd_body() even if they don't support the always_inline attribute (or MSVC's __forceinline). Quite possibly all compilers that build the code do support such an attribute. But now it likely isn't a problem even if the attribute wasn't supported. Now all x86-specific stuff is in crc_x86_clmul.h. If other archs The other archs can then have their own headers with their own is_clmul_supported() and crcxx_clmul(). Another bonus is that the build system doesn't need to care if crc_clmul.c is needed. is_clmul_supported() stays as inline function as it's not needed when doing a CLMUL-only build (avoids a warning about unused function).
2024-01-10liblzma: CRC: Add empty lines.Lasse Collin1-0/+3
And remove one too.
2023-10-21liblzma: Move is_clmul_supported() back to crc_common.h.Jia Tan1-1/+1
This partially reverts creating crc_clmul.c (8c0f9376f58c0696d5d6719705164d35542dd891) where is_clmul_supported() was moved, extern'ed, and renamed to lzma_is_clmul_supported(). This caused a problem when the function call to lzma_is_clmul_supported() results in a call through the PLT. ifunc resolvers run very early in the dynamic loading sequence, so the PLT may not be setup properly at this point. Whether the PLT is used or not for lzma_is_clmul_supported() depened upon the compiler-toolchain used and flags. In liblzma compiled with GCC, for instance, GCC will go through the PLT for function calls internal to liblzma if the version scripts and symbol visibility hiding are not used. If lazy-binding is disabled, then it would have made any program linked with liblzma fail during dynamic loading in the ifunc resolver.
2023-10-18liblzma: Refactor CRC comments.Jia Tan1-53/+8
A detailed description of the three dispatch methods was added. Also, duplicated comments now only appear in crc32_fast.c or were removed from both crc32_fast.c and crc64_fast.c if they appeared in crc_clmul.c.
2023-10-18liblzma: Create crc_clmul.c.Jia Tan1-124/+4
Both crc32_clmul() and crc64_clmul() are now exported from crc32_clmul.c as lzma_crc32_clmul() and lzma_crc64_clmul(). This ensures that is_clmul_supported() (now lzma_is_clmul_supported()) is not duplicated between crc32_fast.c and crc64_fast.c. Also, it encapsulates the complexity of the CLMUL implementations into a single file and reduces the complexity of crc32_fast.c and crc64_fast.c. Before, CLMUL code was present in crc32_fast.c, crc64_fast.c, and crc_common.h. During the conversion, various cleanups were applied to code (thanks to Lasse Collin) including: - Require using semicolons with MASK_/L/H/LH macros. - Variable typing and const handling improvements. - Improvements to comments. - Fixes to the pragmas used. - Removed unneeded variables. - Whitespace improvements. - Fixed CRC_USE_GENERIC_FOR_SMALL_INPUTS handling. - Silenced warnings and removed the need for some #pragmas
2023-10-18liblzma: Define CRC_USE_IFUNC in crc_common.h.Jia Tan1-2/+1
When ifunc is supported, we can define a simpler macro instead of repeating the more complex check in both crc32_fast.c and crc64_fast.c.
2023-10-13liblzma: Moved CLMUL CRC logic to crc_common.h.Hans Jansen1-245/+12
crc64_fast.c was updated to use the code from crc_common.h instead.
2023-10-13liblzma: Rename crc_macros.h to crc_common.h.Hans Jansen1-1/+1
2023-09-14liblzma: Mark crc64_clmul() with __attribute__((__no_sanitize_address__)).Lasse Collin1-0/+8
Thanks to Agostino Sarubbo. Fixes: https://github.com/tukaani-project/xz/issues/62
2023-07-19liblzma: Suppress -Wunused-function warning.Jia Tan1-0/+10
Clang 16.0.0 and earlier have a bug that the ifunc resolver function triggers the -Wunused-function warning. The resolver function is static and only "used" by the __attribute__((__ifunc()__)). At this time, the bug is still unresolved, but has been reported: https://github.com/llvm/llvm-project/issues/63957 This is not a problem in GCC.
2023-06-27liblzma: Add ifunc implementation to crc64_fast.c.Lasse Collin1-9/+26
The ifunc method avoids indirection via the function pointer crc64_func. This works on GNU/Linux and probably on FreeBSD too. The previous __attribute((__constructor__)) method is kept for compatibility with ELF platforms which do support ifunc. The ifunc method has some limitations, for example, building liblzma with -fsanitize=address will result in segfaults. The configure option --disable-ifunc must be used for such builds. Thanks to Hans Jansen for the original patch. Closes: https://github.com/tukaani-project/xz/pull/53
2023-02-16liblzma: Silence a warning from MSVC.Lasse Collin1-1/+1
It gives C4146 here since unary minus with unsigned integer is still unsigned (which is the intention here). Doing it with substraction makes it clearer and avoids the warning. Thanks to Nathan Moinvaziri for reporting this.
2023-01-10liblzma: CLMUL CRC64: Work around a bug in MSVC, second attempt.Lasse Collin1-0/+18
This affects only 32-bit x86 builds. x86-64 is OK as is. I still cannot easily test this myself. The reporter has tested this and it passes the tests included in the CMake build and performance is good: raw CRC64 is 2-3 times faster than the C version of the slice-by-four method. (Note that liblzma doesn't include a MSVC-compatible version of the 32-bit x86 assembly code for the slice-by-four method.) Thanks to Iouri Kharon for figuring out a fix, testing, and benchmarking.
2023-01-10Revert "liblzma: CLMUL CRC64: Workaround a bug in MSVC (VS2015-2022)."Lasse Collin1-6/+0
This reverts commit 36edc65ab4cf10a131f239acbd423b4510ba52d5. It was reported that it wasn't a good enough fix and MSVC still produced (different kind of) bad code when building for 32-bit x86 if optimizations are enabled. Thanks to Iouri Kharon.
2023-01-09liblzma: CLMUL CRC64: Workaround a bug in MSVC (VS2015-2022).Lasse Collin1-0/+6
I haven't tested with MSVC myself and there doesn't seem to be information about the problem online, so I'm relying on the bug report. Thanks to Iouri Kharon for the bug report and the patch.
2022-11-14liblzma: Add fast CRC64 for 32/64-bit x86 using SSSE3 + SSE4.1 + CLMUL.Lasse Collin1-5/+444
It also works on E2K as it supports these intrinsics. On x86-64 runtime detection is used so the code keeps working on older processors too. A CLMUL-only build can be done by using -msse4.1 -mpclmul in CFLAGS and this will reduce the library size since the generic implementation and its 8 KiB lookup table will be omitted. On 32-bit x86 this isn't used by default for now because by default on 32-bit x86 the separate assembly file crc64_x86.S is used. If --disable-assembler is used then this new CLMUL code is used the same way as on 64-bit x86. However, a CLMUL-only build (-msse4.1 -mpclmul) won't omit the 8 KiB lookup table on 32-bit x86 due to a currently-missing check for disabled assembler usage. The configure.ac check should be such that the code won't be built if something in the toolchain doesn't support it but --disable-clmul-crc option can be used to unconditionally disable this feature. CLMUL speeds up decompression of files that have compressed very well (assuming CRC64 is used as a check type). It is know that the CLMUL code is significantly slower than the generic code for tiny inputs (especially 1-8 bytes but up to 16 bytes). If that is a real-world problem then there is already a commented-out variant that uses the generic version for small inputs. Thanks to Ilya Kurdyukov for the original patch which was derived from a white paper from Intel [1] (published in 2009) and public domain code from [2] (released in 2016). [1] https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/fast-crc-computation-generic-polynomials-pclmulqdq-paper.pdf [2] https://github.com/rawrunprotected/crc
2022-10-31liblzma: Silence -Wconversion warning from crc64_fast.c.Lasse Collin1-2/+3
2019-12-31Rename read32ne to aligned_read32ne, and similarly for the others.Lasse Collin1-2/+2
Using the aligned methods requires more care to ensure that the address really is aligned, so it's nicer if the aligned methods are prefixed. The next commit will remove the unaligned_ prefix from the unaligned methods which in liblzma are used in more places than the aligned ones.
2009-11-22Add missing consts to pointer casts.Lasse Collin1-2/+3
2009-10-04Use a tuklib module for integer handling.Lasse Collin1-2/+2
This replaces bswap.h and integer.h. The tuklib module uses <byteswap.h> on GNU, <sys/endian.h> on *BSDs and <sys/byteorder.h> on Solaris, which may contain optimized code like inline assembly.
2009-04-13Put the interesting parts of XZ Utils into the public domain.Lasse Collin1-12/+8
Some minor documentation cleanups were made at the same time.
2009-02-02Modify LZMA_API macro so that it works on Windows withLasse Collin1-1/+1
other compilers than MinGW. This may hurt readability of the API headers slightly, but I don't know any better way to do this.
2008-12-31Remove lzma_init() and other init functions from liblzma API.Lasse Collin1-0/+75
Half of developers were already forgetting to use these functions, which could have caused total breakage in some future liblzma version or even now if --enable-small was used. Now liblzma uses pthread_once() to do the initializations unless it has been built with --disable-threads which make these initializations thread-unsafe. When --enable-small isn't used, liblzma currently gets needlessly linked against libpthread (on systems that have it). While it is stupid for now, liblzma will need threads in future anyway, so this stupidity will be temporary only. When --enable-small is used, different code CRC32 and CRC64 is now used than without --enable-small. This made the resulting binary slightly smaller, but the main reason was to clean it up and to handle the lack of lzma_init_check(). The pkg-config file lzma.pc was renamed to liblzma.pc. I'm not sure if it works correctly and portably for static linking (Libs.private includes -pthread or other operating system specific flags). Hopefully someone complains if it is bad. lzma_rc_prices[] is now included as a precomputed array even with --enable-small. It's just 128 bytes now that it uses uint8_t instead of uint32_t. Smaller array seemed to be at least as fast as the more bloated uint32_t array on x86; hopefully it's not bad on other architectures.