Age | Commit message (Collapse) | Author | Files | Lines |
|
It requires fast unaligned access to 64-bit integers
and a fast instruction to count leading zeros in
a 64-bit integer (__builtin_ctzll()). This perhaps
should be enabled on some other archs too.
Thanks to Chenxi Mao for the original patch:
https://github.com/tukaani-project/xz/pull/75 (the first commit)
According to the numbers there, this may improve encoding
speed by about 3-5 %.
This enables the 8-byte method on MSVC ARM64 too which
should work but wasn't tested.
|
|
This change hopefully makes no practical difference as Clang
likely was detected via __GNUC__ or _MSC_VER already.
|
|
|
|
The API headers have many attributes but these were left
as is for now.
|
|
Maybe ICC always #defines _MSC_VER on Windows but now
it's very clear which code will get used.
|
|
|
|
In lzma_memcmplen(), the <intrin.h> header file is only included if
_MSC_VER and _M_X64 are both defined but _BitScanForward64() was
previously used if _M_X64 was defined. GCC for MSYS2 defines _M_X64 but
not _MSC_VER so _BitScanForward64() was used without including
<intrin.h>.
Now, lzma_memcmplen() will use __builtin_ctzll() for MSYS2 GCC builds as
expected.
|
|
Legacy Windows did not need to #include <intrin.h> to use the MSVC
intrinsics. Newer versions likely just issue a warning, but the MSVC
documentation says to include the header file for the intrinsics we use.
GCC and Clang can "pretend" to be MSVC on Windows, so extra checks are
needed in tuklib_integer.h to only include <intrin.h> when it will is
actually needed.
|
|
Thanks to Christian Hesse for reporting the issue.
Fixes: https://github.com/tukaani-project/xz/issues/44
|
|
__SSE2__ is the correct macro for SSE2 support with GCC, Clang,
and ICC. __SSE2_MATH__ means doing floating point math with SSE2
instead of 387. Often the latter macro is defined if the first
one is but it was still a bug.
|
|
Also update the comment in liblzma's memcmplen.h.
Thanks to Michał Górny for the original patch for the reads.
|
|
|
|
The same compiler-specific #ifdefs are already in tuklib_integer.h
|
|
Now gcc -fsanitize=undefined should be clean.
Thanks to Jeffrey Walton.
|
|
|
|
This commit just adds the function. Its uses will be in
separate commits.
This hasn't been tested much yet and it's perhaps a bit early
to commit it but if there are bugs they should get found quite
quickly.
Thanks to Jun I Jin from Intel for help and for pointing out
that string comparison needs to be optimized in liblzma.
|