aboutsummaryrefslogtreecommitdiff
path: root/src/xz/coder.c (follow)
AgeCommit message (Collapse)AuthorFilesLines
2024-02-28xz: Add comments.Lasse Collin1-0/+10
2024-02-28xz: Change logging level for thread reduction to highest verbosity only.Jia Tan1-2/+2
Now that multi threaded encoding is the default, users do not need to see a warning message everytime the number of threads is reduced. On some machines, this could happen very often. It is not unreasonable for users to need to set double verbose mode to see this kind of information. To see these warning messages -vv or --verbose --verbose must be passed to set xz into the highest possible verbosity mode. These warnings had caused automated testing frameworks to fail when they expected no output to stderr. Thanks to Sebastian Andrzej Siewior for reporting this and for the initial version of the patch.
2024-02-14Add SPDX license identifier into 0BSD source code files.Lasse Collin1-0/+2
2024-02-14Change most public domain parts to 0BSD.Lasse Collin1-3/+0
Translations and doc/xz-file-format.txt and doc/lzma-file-format.txt were not touched. COPYING.0BSD was added.
2023-09-22xz, xzdec, lzmainfo: Use tuklib_attr_noreturn.Lasse Collin1-1/+2
For compatibility with C23's [[noreturn]], tuklib_attr_noreturn must be at the beginning of declaration (before "extern" or "static", and even before any GNU C's __attribute__). This commit also moves all other function attributes to the beginning of function declarations. "extern" is kept at the beginning of a line so the attributes are listed on separate lines before "extern" or "static".
2023-07-18xz: Make "%s: %s" translatable because French needs "%s : %s".Lasse Collin1-4/+4
2023-07-18xz: Update Authors list in a few files.Jia Tan1-1/+2
2023-07-17xz: Minor clean up for coder.cJia Tan1-32/+21
* Moved max_block_list_size from a global to local variable. * Reworded error message in validate_block_list_filter(). * Removed helper function filter_chain_error(). * Changed 1 << X to 1U << X in many places
2023-07-17xz: Ignore filter chains that are set but never used in --block-list.Jia Tan1-18/+48
If a filter chain is set but not used in --block-list, it introduced unexpected behavior such as requiring an unneeded amount of memory to compress, reducing the number of threads in multi-threaded encoding, and printing an incorrect amount of memory needed to decompress. This also renames filters_init_mask => filters_used_mask. A filter is assumed to be used if it is specified in --filtersX until coder_set_compression_settings() determines which filters are referenced in --block-list.
2023-07-17xz: Set the Block size for mt encoding correctly.Jia Tan1-1/+67
When opt_block_size is not used, the Block size for mt encoder is derived from the minimum of the largest Block specified by --block-list and the recommended Block size on all filter chains calculated by lzma_mt_block_size(). This avoids using unnecessary memory and ensures that all Blocks are large enough for the most memory needy filter chain.
2023-07-17xz: Validate --flush-timeout for all specified filter chains.Jia Tan1-8/+16
2023-07-17xz: Allows --block-list filters to scale down memory usage.Jia Tan1-55/+214
Previously, only the default filter chain could have its memory usage adjusted. The filter chains specified with --filtersX were not checked for memory usage. Now, all used filter chains will be adjusted if necessary.
2023-07-17xz: Do not include block splitting if encoders are disabled.Jia Tan1-9/+20
The block splitting logic and split_block() function are not needed if encoders are disabled. This will help slightly reduce the binary size when built without encoders and allow split_block() to use functions that require encoders being enabled.
2023-07-17xz: Free filters[] in debug mode.Jia Tan1-0/+10
This will only free filter chains created with --filters1-9 since the default filter chain may be set from a static function variable. The complexity to free the default filter chain is not worth the burden on code maintenance.
2023-07-17xz: Create command line options for filters[1-9].Jia Tan1-50/+138
The new command line options are meant to be combined with --block-list. They work as an optional extension to --block-list to specify a custom filter chain for each block listed. The new options allow the creation of up to 9 reusable filter chains. For instance: xz --block-list=1:10MiB,3:5MiB,,2:5MiB,1:0 --filters1=delta--lzma2 \ --filters2=x86--lzma2 --filters3=arm64--lzma2 Will create the following blocks: 1. A block of size 10 MiB with filter chain delta, lzma2. 2. A block of size 5 MiB with filter chain arm64, lzma2. 3. A block of size 5 MiB with filter chain arm64, lzma2. 4. A block of size 5 MiB with filter chain x86, lzma2. 5. A block containing the rest of the file contents with filter chain delta, lzma2.
2023-07-17xz: Use lzma_filters_free() in forget_filter_chain().Jia Tan1-8/+10
This is a little cleaner than the previous implementation of forget_filter_chain(). It is also more consistent since lzma_str_to_filters() will always terminate the filter chain so there is no need to terminate it later in coder_set_compression_settings().
2023-07-17xz: Separate string to filter conversion into a helper function.Jia Tan1-13/+20
Converting from string to filter will also need to be done for block specific filter chains.
2023-07-17xz: Add --filters option to CLI.Jia Tan1-2/+48
The --filters option uses the new lzma_str_to_filters() function to convert a string into a full filter chain. Using this option will reset all previous filters set by --preset, --[filter], or --filters.
2022-11-09xz: Add .lz (lzip) decompression support.Lasse Collin1-3/+65
If configured with --disable-lzip-decoder then --long-help will still list `lzip' in --format but I left it like that since due to translations it would be messy to have two help strings. Features are disabled only in special situations so wrong help in such a situation shouldn't matter much. Thanks to Michał Górny for the original patch.
2022-11-09xz: Add comments about stdin and src_st.st_size.Lasse Collin1-0/+9
"xz -v < regular_file > out.xz" doesn't display the percentage and estimated remaining time because it doesn't even try to check the input file size when input is read from stdin. This could be improved but for now there's just a comment to remind about it.
2022-11-09xz: Fix displaying of file sizes in progress indicator in passthru mode.Lasse Collin1-1/+5
It worked for one input file since the counters are zero when xz starts but they weren't reset when starting a new file in passthru mode. For example, if files A, B, and C are one byte each, then "xz -dcvf A B C" would show file sizes as 1, 2, and 3 bytes instead of 1, 1, and 1 byte.
2022-10-25xz: Fix --single-stream with an empty .xz Stream.Lasse Collin1-0/+9
Example: $ xz -dc --single-stream good-0-empty.xz xz: good-0-empty.xz: Internal error (bug) The code, that is tries to catch some input file issues early, didn't anticipate LZMA_STREAM_END which is possible in that code only when --single-stream is used.
2022-10-25xz: Fix decompressor behavior if input uses an unsupported check type.Lasse Collin1-4/+15
Now files with unsupported check will make xz display a warning, set the exit status to 2 (unless --no-warn is used), and then decompress the file normally. This is how it was supposed to work since the beginning but this was broken by the commit 231c3c7098f1099a56abb8afece76fc9b8699f05, that is, a little before 5.0.0 was released. The buggy behavior displayed a message, set exit status 1 (error), and xz didn't attempt to to decompress the file. This doesn't matter today except for special builds that disable CRC64 or SHA-256 at build time (but such builds should be used in special situations only). The bug matters if new check type is added in the future and an old xz version is used to decompress such a file; however, it's likely that such files would use a new filter too and an old xz wouldn't be able to decompress the file anyway. The first hunk in the commit is the actual fix. The second hunk is a cleanup since LZMA_TELL_ANY_CHECK isn't used in xz. There is a test file for unsupported check type but it wasn't used by test_files.sh, perhaps due to different behavior between xz and the simpler xzdec.
2022-04-14xz: Add a default soft memory usage limit for --threads=0.Lasse Collin1-2/+26
This is a soft limit in sense that it only affects the number of threads. It never makes xz fail and it never makes xz change settings that would affect the compressed output. The idea is to make -T0 have more reasonable behavior when the system has very many cores or when a memory-hungry compression options are used. This also helps with 32-bit xz, preventing it from running out of address space. The downside of this commit is that now the number of threads might become too low compared to what the user expected. I hope this to be an acceptable compromise as the old behavior has been a source of well-argued complaints for a long time.
2022-04-14xz: Make -T0 use multithreaded mode on single-core systems.Lasse Collin1-9/+9
The main problem withi the old behavior is that the compressed output is different on single-core systems vs. multicore systems. This commit fixes it by making -T0 one thread in multithreaded mode on single-core systems. The downside of this is that it uses more memory. However, if --memlimit-compress is used, xz can (thanks to the previous commit) drop to the single-threaded mode still.
2022-04-14xz: Changes to --memlimit-compress and --no-adjust.Lasse Collin1-20/+43
In single-threaded mode, --memlimit-compress can make xz scale down the LZMA2 dictionary size to meet the memory usage limit. This obviously affects the compressed output. However, if xz was in threaded mode, --memlimit-compress could make xz reduce the number of threads but it wouldn't make xz switch from multithreaded mode to single-threaded mode or scale down the LZMA2 dictionary size. This seemed illogical and there was even a "FIXME?" about it. Now --memlimit-compress can make xz switch to single-threaded mode if one thread in multithreaded mode uses too much memory. If memory usage is still too high, then the LZMA2 dictionary size can be scaled down too. The option --no-adjust was also changed so that it no longer prevents xz from scaling down the number of threads as that doesn't affect compressed output (only performance). After this commit --no-adjust only prevents adjustments that affect compressed output, that is, with --no-adjust xz won't switch from multithreaded mode to single-threaded mode and won't scale down the LZMA2 dictionary size. The man page wasn't updated yet.
2022-04-12xz: Add --memlimit-mt-decompress along with a default limit value.Lasse Collin1-23/+11
--memlimit-mt-decompress allows specifying the limit for multithreaded decompression. This matches memlimit_threading in liblzma. This limit can only affect the number of threads being used; it will never prevent xz from decompressing a file. The old --memlimit-decompress option is still used at the same time. If the value of --memlimit-decompress (the default value or one specified by the user) is less than the value of --memlimit-mt-decompress , then --memlimit-mt-decompress is reduced to match --memlimit-decompress. Man page wasn't updated yet.
2022-03-07xz: Add initial support for threaded decompression.Lasse Collin1-1/+35
If threading support is enabled at build time, this will use lzma_stream_decoder_mt() even for single-threaded mode. With memlimit_threading=0 the behavior should be identical. This needs some work like adding --memlimit-threading=LIMIT. The original patch from Sebastian Andrzej Siewior included a method to get currently available RAM on Linux. It might be one way to go but as it is Linux-only, the available-RAM approach needs work for portability or using a fallback method on other OSes. The man page wasn't updated yet.
2020-01-26xz: Set the --flush-timeout deadline when the first input byte arrives.Lasse Collin1-5/+1
xz --flush-timeout=2000, old version: 1. xz is started. The next flush will happen after two seconds. 2. No input for one second. 3. A burst of a few kilobytes of input. 4. No input for one second. 5. Two seconds have passed and flushing starts. The first second counted towards the flush-timeout even though there was no pending data. This can cause flushing to occur more often than needed. xz --flush-timeout=2000, after this commit: 1. xz is started. 2. No input for one second. 3. A burst of a few kilobytes of input. The next flush will happen after two seconds counted from the time when the first bytes of the burst were read. 4. No input for one second. 5. No input for another second. 6. Two seconds have passed and flushing starts.
2020-01-26xz: Move flush_needed from mytime.h to file_pair struct in file_io.h.Lasse Collin1-1/+2
2020-01-26xz: coder.c: Make writing output a separate function.Lasse Collin1-13/+17
The same code sequence repeats so it's nicer as a separate function. Note that in one case there was no test for opt_mode != MODE_TEST, but that was only because that condition would always be true, so this commit doesn't change the behavior there.
2020-01-26xz: Fix semi-busy-waiting in xz --flush-timeout.Lasse Collin1-0/+4
When input blocked, xz --flush-timeout=1 would wake up every millisecond and initiate flushing which would have nothing to flush and thus would just waste CPU time. The fix disables the timeout when no input has been seen since the previous flush.
2019-06-23xz: Fix some of the warnings from -Wsign-conversion.Lasse Collin1-2/+2
2019-05-11spellingAntoine Cœur1-2/+2
2018-12-20xz: Fix a crash in progress indicator when in passthru mode.Lasse Collin1-4/+7
"xz -dcfv not_an_xz_file" crashed (all four options are required to trigger it). It caused xz to call lzma_get_progress(&strm, ...) when no coder was initialized in strm. In this situation strm.internal is NULL which leads to a crash in lzma_get_progress(). The bug was introduced when xz started using lzma_get_progress() to get progress info for multi-threaded compression, so the bug is present in versions 5.1.3alpha and higher. Thanks to Filip Palian <Filip.Palian@pjwstk.edu.pl> for the bug report.
2015-11-03xz: Make xz buildable even when encoders or decoders are disabled.Lasse Collin1-8/+25
The patch is quite long but it's mostly about adding new #ifdefs to omit code when encoders or decoders have been disabled. This adds two new #defines to config.h: HAVE_ENCODERS and HAVE_DECODERS.
2014-08-05xz: Add --ignore-check.Lasse Collin1-1/+9
2014-06-18xz: Check for filter chain compatibility for --flush-timeout.Lasse Collin1-9/+21
This avoids LZMA_PROG_ERROR from lzma_code() with filter chains that don't support LZMA_SYNC_FLUSH.
2014-06-09xz: Force single-threaded mode when --flush-timeout is used.Lasse Collin1-0/+11
2014-05-08xz: Fix uint64_t vs. size_t which broke 32-bit build.Lasse Collin1-1/+1
Thanks to Christian Hesse.
2014-01-12xz: Fix a comment.Lasse Collin1-2/+2
2013-11-12xz: Make --block-list and --block-size work together in single-threaded.Lasse Collin1-15/+75
Previously, --block-list and --block-size only worked together in threaded mode. Boundaries are specified by --block-list, but --block-size specifies the maximum size for a Block. Now this works in single-threaded mode too. Thanks to James M Leddy for the original patch.
2013-10-22xz: Take advantage of LZMA_FULL_BARRIER with --block-list.Lasse Collin1-17/+15
Now if --block-list is used in threaded mode, the encoder won't need to flush at each Block boundary specified via --block-list. This improves performance a lot, making threading helpful with --block-list. The flush timer was reset after LZMA_FULL_FLUSH but since LZMA_FULL_BARRIER doesn't flush, resetting the timer is no longer done.
2013-09-17Add native threading support on Windows.Lasse Collin1-4/+4
Now liblzma only uses "mythread" functions and types which are defined in mythread.h matching the desired threading method. Before Windows Vista, there is no direct equivalent to pthread condition variables. Since this package doesn't use pthread_cond_broadcast(), pre-Vista threading can still be kept quite simple. The pre-Vista code doesn't use anything that wasn't already available in Windows 95, so the binaries should run even on Windows 95 if someone happens to care.
2013-07-04xz: Add preliminary support for --flush-timeout=TIMEOUT.Lasse Collin1-11/+35
When --flush-timeout=TIMEOUT is used, xz will use LZMA_SYNC_FLUSH if read() would block and at least TIMEOUT milliseconds has elapsed since the previous flush. This can be useful in realtime-like use cases where the data is simultanously decompressed by another process (possibly on a different computer). If new uncompressed input data is produced slowly, without this option xz could buffer the data for a long time until it would become decompressible from the output. If TIMEOUT is 0, the feature is disabled. This is the default. This commit affects the compression side. Using xz for the decompression side for the above purpose doesn't work yet so well because there is quite a bit of input and output buffering when decompressing. The --long-help or man page were not updated yet. The details of this feature may change.
2013-07-04xz: Fix the test when to read more input.Lasse Collin1-3/+3
Testing for end of file was no longer correct after full flushing became possible with --block-size=SIZE and --block-list=SIZES. There was no bug in practice though because xz just made a few unneeded zero-byte reads.
2013-07-04xz: Move some of the timing code into mytime.[hc].Lasse Collin1-0/+5
This switches units from microseconds to milliseconds. New clock_gettime(CLOCK_MONOTONIC) will be used if available. There is still a fallback to gettimeofday().
2013-06-21xz: Fix interaction between preset and custom filter chains.Lasse Collin1-14/+21
There was somewhat illogical behavior when --extreme was specified and mixed with custom filter chains. Before this commit, "xz -9 --lzma2 -e" was equivalent to "xz --lzma2". After it is equivalent to "xz -6e" (all earlier preset options get forgotten when a custom filter chain is specified and the default preset is 6 to which -e is applied). I find this less illogical. This also affects the meaning of "xz -9e --lzma2 -7". Earlier it was equivalent to "xz -7e" (the -e specified before a custom filter chain wasn't forgotten). Now it is "xz -7". Note that "xz -7e" still is the same as "xz -e7". Hopefully very few cared about this in the first place, so pretty much no one should even notice this change. Thanks to Conley Moorhous.
2012-07-03xz: Add incomplete support for --block-list.Lasse Collin1-8/+40
It's broken with threads and when also --block-size is used.
2011-11-03xz: Fix xz on EBCDIC systems.Lasse Collin1-1/+4
Thanks to Chris Donawa.
2011-05-17Add underscores to attributes (__attribute((__foo__))).Lasse Collin1-1/+1
2011-05-01xz: Fix input file position when --single-stream is used.Lasse Collin1-0/+1
Now the following works as you would expect: echo foo | xz > foo.xz echo bar | xz >> foo.xz ( xz -dc --single-stream ; xz -dc --single-stream ) < foo.xz Note that it doesn't work if the input is not seekable or if there is Stream Padding between the concatenated .xz Streams.
2011-05-01xz: Print the maximum number of worker threads in xz -vv.Lasse Collin1-0/+4
2011-04-11xz: Add support for threaded compression.Lasse Collin1-79/+123
2011-04-08xz: Change size_t to uint32_t in a few places.Lasse Collin1-3/+3
2011-04-08xz: Fix a typo in a comment.Lasse Collin1-1/+1
2011-04-05xz: Call lzma_end(&strm) before exiting if debugging is enabled.Lasse Collin1-0/+10
2011-03-18xz: Add --block-size=SIZE.Lasse Collin1-10/+40
This uses LZMA_FULL_FLUSH every SIZE bytes of input. Man page wasn't updated yet.
2011-03-18xz: Add --single-stream.Lasse Collin1-2/+9
This can be useful when there is garbage after the compressed stream (.xz, .lzma, or raw stream). Man page wasn't updated yet.
2010-09-03xz: Make -vv show also decompressor memory usage.Lasse Collin1-0/+7
2010-09-02xz: Make setting a preset override a custom filter chain.Lasse Collin1-0/+9
This is more logical behavior than ignoring preset level options once a custom filter chain has been specified.
2010-09-02xz: Always warn if adjusting dictionary size due to memlimit.Lasse Collin1-19/+9
2010-08-07Disable the memory usage limiter by default.Lasse Collin1-3/+5
For several people, the limiter causes bigger problems that it solves, so it is better to have it disabled by default. Those who want to have a limiter by default need to enable it via the environment variable XZ_DEFAULTS. Support for environment variable XZ_DEFAULTS was added. It is parsed before XZ_OPT and technically identical with it. The intended uses differ quite a bit though; see the man page. The memory usage limit can now be set separately for compression and decompression using --memlimit-compress and --memlimit-decompress. To set both at once, -M or --memlimit can be used. --memory was retained as a legacy alias for --memlimit for backwards compatibility. The semantics of --info-memory were changed in backwards incompatible way. Compatibility wasn't meaningful due to changes in the memory usage limiter functionality. The memory usage limiter info is no longer shown at the bottom of xz --long -help. The memory usage limiter support for removed completely from xzdec. xz's man page was updated to match the above changes. Various unrelated fixes were also made to the man page.
2010-06-15Add --no-adjust.Lasse Collin1-6/+2
2010-05-16Split message_filters().Lasse Collin1-1/+1
message_filters_to_str() converts the filter chain to a string. message_filters_show() replaces the original message_filters(). uint32_to_optstr() was also added to show the dictionary size in nicer format when possible.
2010-02-12Collection of language fixes to comments and docs.Lasse Collin1-1/+1
Thanks to Jonathan Nieder.
2010-01-31Select the default integrity check type at runtime.Lasse Collin1-5/+14
Previously it was set statically to CRC64 or CRC32 depending on options passed to the configure script.
2010-01-31Improve displaying of the memory usage limit.Lasse Collin1-5/+3
2010-01-31Delay opening the destionation file and other fixes.Lasse Collin1-20/+44
The opening of the destination file is now delayed a little. The coder is initialized, and if decompressing, the memory usage of the first Block compared against the memory usage limit before the destination file is opened. This means that if --force was used, the old "target" file won't be deleted so easily when something goes wrong very early. Thanks to Mark K for the bug report. The above fix required some changes to progress message handling. Now there is a separate function for setting and printing the filename. It is used also in list.c. list_file() now handles stdin correctly (gives an error). A useless check for user_abort was removed from file_io.c.
2010-01-24Some improvements to printing sizes in xz.Lasse Collin1-35/+21
2009-11-25Add missing error check to coder.c.Lasse Collin1-9/+11
With bad luck this could cause a segfault due to reading (but not writing) past the end of the buffer.
2009-11-25Create sparse files by default when decompressing intoLasse Collin1-16/+17
a regular file. Sparse file creation can be disabled with --no-sparse. I don't promise yet that the name of this option won't change before 5.0.0. It's possible that the code, that checks when it is safe to use sparse output on stdout, is not good enough, and a more flexible command line option is needed to configure sparse file handling.
2009-07-20Avoid internal error with --format=xz --lzma1.Lasse Collin1-4/+12
2009-07-04Make "xz --decompress --stdout --force" copy unrecognizedLasse Collin1-35/+178
files as is to standard output. This feature is needed to be more compatible with gzip's behavior. This was more complicated to implement than it sounds, because the way liblzma is able to return errors with files of only a few bytes in size. xz now has its own file type detection code and no longer uses lzma_auto_decoder().
2009-06-26Updated comments to match renamed files.Lasse Collin1-1/+1
2009-06-26Rename process.[hc] to coder.[hc] and io.[hc] to file_io.[hc]Lasse Collin1-0/+488
to avoid problems on systems with system headers with those names.