diff options
author | Lasse Collin <lasse.collin@tukaani.org> | 2023-10-20 23:35:10 +0300 |
---|---|---|
committer | Lasse Collin <lasse.collin@tukaani.org> | 2024-01-11 14:29:42 +0200 |
commit | 419f55f9dfc2df8792902b8953d50690121afeea (patch) | |
tree | aa95af5e4119ab90423c19ad64cfa73df0044f1c /src/liblzma/check/crc_common.h | |
parent | liblzma: crc_clmul.c: Add crc_attr_target macro. (diff) | |
download | xz-419f55f9dfc2df8792902b8953d50690121afeea.tar.xz |
liblzma: Avoid extern lzma_crc32_clmul() and lzma_crc64_clmul().
A CLMUL-only build will have the crcxx_clmul() inlined into
lzma_crcxx(). Previously a jump to the extern lzma_crcxx_clmul()
was needed. Notes about shared liblzma on ELF platforms:
- On platforms that support ifunc and -fvisibility=hidden, this
was silly because CLMUL-only build would have that single extra
jump instruction of extra overhead.
- On platforms that support neither -fvisibility=hidden nor linker
version script (liblzma*.map), jumping to lzma_crcxx_clmul()
would go via PLT so a few more instructions of overhead (still
not a big issue but silly nevertheless).
There was a downside with static liblzma too: if an application only
needs lzma_crc64(), static linking would make the linker include the
CLMUL code for both CRC32 and CRC64 from crc_x86_clmul.o even though
the CRC32 code wouldn't be needed, thus increasing code size of the
executable (assuming that -ffunction-sections isn't used).
Also, now compilers are likely to inline crc_simd_body()
even if they don't support the always_inline attribute
(or MSVC's __forceinline). Quite possibly all compilers
that build the code do support such an attribute. But now
it likely isn't a problem even if the attribute wasn't supported.
Now all x86-specific stuff is in crc_x86_clmul.h. If other archs
The other archs can then have their own headers with their own
is_clmul_supported() and crcxx_clmul().
Another bonus is that the build system doesn't need to care if
crc_clmul.c is needed.
is_clmul_supported() stays as inline function as it's not needed
when doing a CLMUL-only build (avoids a warning about unused function).
Diffstat (limited to '')
-rw-r--r-- | src/liblzma/check/crc_common.h | 64 |
1 files changed, 0 insertions, 64 deletions
diff --git a/src/liblzma/check/crc_common.h b/src/liblzma/check/crc_common.h index c949f793..552219fe 100644 --- a/src/liblzma/check/crc_common.h +++ b/src/liblzma/check/crc_common.h @@ -108,70 +108,6 @@ # define CRC_USE_GENERIC_FOR_SMALL_INPUTS 1 # endif */ - -# if defined(_MSC_VER) -# include <intrin.h> -# elif defined(HAVE_CPUID_H) -# include <cpuid.h> -# endif - -// is_clmul_supported() must be inlined in this header file because the -// ifunc resolver function may not support calling a function in another -// translation unit. Depending on compiler-toolchain and flags, a call to -// a function defined in another translation unit could result in a -// reference to the PLT, which is unsafe to do in an ifunc resolver. The -// ifunc resolver runs very early when loading a shared library, so the PLT -// entries may not be setup at that time. Inlining this function duplicates -// the function body in crc32_resolve() and crc64_resolve(), but this is -// acceptable because the function results in very few instructions. -static inline bool -is_clmul_supported(void) -{ - int success = 1; - uint32_t r[4]; // eax, ebx, ecx, edx - -#if defined(_MSC_VER) - // This needs <intrin.h> with MSVC. ICC has it as a built-in - // on all platforms. - __cpuid(r, 1); -#elif defined(HAVE_CPUID_H) - // Compared to just using __asm__ to run CPUID, this also checks - // that CPUID is supported and saves and restores ebx as that is - // needed with GCC < 5 with position-independent code (PIC). - success = __get_cpuid(1, &r[0], &r[1], &r[2], &r[3]); -#else - // Just a fallback that shouldn't be needed. - __asm__("cpuid\n\t" - : "=a"(r[0]), "=b"(r[1]), "=c"(r[2]), "=d"(r[3]) - : "a"(1), "c"(0)); #endif - // Returns true if these are supported: - // CLMUL (bit 1 in ecx) - // SSSE3 (bit 9 in ecx) - // SSE4.1 (bit 19 in ecx) - const uint32_t ecx_mask = (1 << 1) | (1 << 9) | (1 << 19); - return success && (r[2] & ecx_mask) == ecx_mask; - - // Alternative methods that weren't used: - // - ICC's _may_i_use_cpu_feature: the other methods should work too. - // - GCC >= 6 / Clang / ICX __builtin_cpu_supports("pclmul") - // - // CPUID decding is needed with MSVC anyway and older GCC. This keeps - // the feature checks in the build system simpler too. The nice thing - // about __builtin_cpu_supports would be that it generates very short - // code as is it only reads a variable set at startup but a few bytes - // doesn't matter here. -} - -#endif - -/// CRC32 implemented with the x86 CLMUL instruction. -extern uint32_t lzma_crc32_clmul(const uint8_t *buf, size_t size, - uint32_t crc); - -/// CRC64 implemented with the x86 CLMUL instruction. -extern uint64_t lzma_crc64_clmul(const uint8_t *buf, size_t size, - uint64_t crc); - #endif |