From f644473a211394447824ea00518d0a214ff3f7f2 Mon Sep 17 00:00:00 2001 From: Lasse Collin Date: Mon, 14 Nov 2022 21:34:57 +0200 Subject: liblzma: Add fast CRC64 for 32/64-bit x86 using SSSE3 + SSE4.1 + CLMUL. It also works on E2K as it supports these intrinsics. On x86-64 runtime detection is used so the code keeps working on older processors too. A CLMUL-only build can be done by using -msse4.1 -mpclmul in CFLAGS and this will reduce the library size since the generic implementation and its 8 KiB lookup table will be omitted. On 32-bit x86 this isn't used by default for now because by default on 32-bit x86 the separate assembly file crc64_x86.S is used. If --disable-assembler is used then this new CLMUL code is used the same way as on 64-bit x86. However, a CLMUL-only build (-msse4.1 -mpclmul) won't omit the 8 KiB lookup table on 32-bit x86 due to a currently-missing check for disabled assembler usage. The configure.ac check should be such that the code won't be built if something in the toolchain doesn't support it but --disable-clmul-crc option can be used to unconditionally disable this feature. CLMUL speeds up decompression of files that have compressed very well (assuming CRC64 is used as a check type). It is know that the CLMUL code is significantly slower than the generic code for tiny inputs (especially 1-8 bytes but up to 16 bytes). If that is a real-world problem then there is already a commented-out variant that uses the generic version for small inputs. Thanks to Ilya Kurdyukov for the original patch which was derived from a white paper from Intel [1] (published in 2009) and public domain code from [2] (released in 2016). [1] https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/fast-crc-computation-generic-polynomials-pclmulqdq-paper.pdf [2] https://github.com/rawrunprotected/crc --- INSTALL | 12 ++++++++++++ 1 file changed, 12 insertions(+) (limited to 'INSTALL') diff --git a/INSTALL b/INSTALL index 2c94ecea..bf1e9c31 100644 --- a/INSTALL +++ b/INSTALL @@ -370,6 +370,18 @@ XZ Utils Installation pre-i686 systems, you may want to disable the assembler code. + --disable-clmul-crc + Disable the use carryless multiplication for CRC + calculation even if compiler support for it is detected. + The code uses runtime detection of SSSE3, SSE4.1, and + CLMUL instructions on x86. On 32-bit x86 this currently + is used only if --disable-assembler is used (this might + be fixed in the future). The code works on E2K too. + + If using compiler options that unconditionally allow the + required extensions (-msse4.1 -mpclmul) then runtime + detection isn't used and the generic code is omitted. + --enable-unaligned-access Allow liblzma to use unaligned memory access for 16-bit, 32-bit, and 64-bit loads and stores. This should be -- cgit v1.2.3