tests/files/README


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315

.xz and .lzma Test Files
------------------------

0. Introduction

    This directory contains bunch of files to test handling of .xz
    and .lzma files in decoder implementations. Many of the files have
    been created by hand with a hex editor, thus there is no better
    "source code" than the files themselves. All the test files and
    this README have been put into the public domain.


1. File Types

    Good files (good-*.xz, good-*.lzma) must decode successfully
    without requiring a lot of CPU time or RAM.

    Unsupported files (unsupported-*.xz) are good files, but headers
    indicate features not supported by the current file format
    specification.

    Bad files (bad-*.xz, bad-*.lzma) must cause the decoder to give
    an error. Like with the good files, these files must not require
    a lot of CPU time or RAM before they get detected to be broken.


2. Descriptions of Individual .xz Files

2.1. Good Files

    good-0-empty.xz has one Stream with no Blocks.

    good-0pad-empty.xz has one Stream with no Blocks followed by
    four-byte Stream Padding.

    good-0cat-empty.xz has two zero-Block Streams concatenated without
    Stream Padding.

    good-0catpad-empty.xz has two zero-Block Streams concatenated with
    four-byte Stream Padding between the Streams.

    good-1-check-none.xz has one Stream with one Block with two
    uncompressed LZMA2 chunks and no integrity check.

    good-1-check-crc32.xz has one Stream with one Block with two
    uncompressed LZMA2 chunks and CRC32 check.

    good-1-check-crc64.xz is like good-1-check-crc32.xz but with CRC64.

    good-1-check-sha256.xz is like good-1-check-crc32.xz but with
    SHA256.

    good-2-lzma2.xz has one Stream with two Blocks with one uncompressed
    LZMA2 chunk in each Block.

    good-1-block_header-1.xz has both Compressed Size and Uncompressed
    Size in the Block Header. This has also four extra bytes of Header
    Padding.

    good-1-block_header-2.xz has known Compressed Size.

    good-1-block_header-3.xz has known Uncompressed Size.

    good-1-delta-lzma2.tiff.xz is an image file that compresses
    better with Delta+LZMA2 than with plain LZMA2.

    good-1-x86-lzma2.xz uses the x86 filter (BCJ) and LZMA2. The
    uncompressed file is compress_prepared_bcj_x86 found from the tests
    directory.

    good-1-sparc-lzma2.xz uses the SPARC filter and LZMA2. The
    uncompressed file is compress_prepared_bcj_sparc found from the tests
    directory.

    good-1-lzma2-1.xz has two LZMA2 chunks, of which the second sets
    new properties.

    good-1-lzma2-2.xz has two LZMA2 chunks, of which the second resets
    the state without specifying new properties.

    good-1-lzma2-3.xz has two LZMA2 chunks, of which the first is
    uncompressed and the second is LZMA. The first chunk resets dictionary
    and the second sets new properties.

    good-1-lzma2-4.xz has three LZMA2 chunks: First is LZMA, second is
    uncompressed with dictionary reset, and third is LZMA with new
    properties but without dictionary reset.

    good-1-lzma2-5.xz has an empty LZMA2 stream with only the end of
    payload marker. XZ Utils 5.0.1 and older incorrectly see this file
    as corrupt.

    good-1-3delta-lzma2.xz has three Delta filters and LZMA2.

    good-1-empty-bcj-lzma2.xz has an empty Block that uses PowerPC BCJ
    and LZMA2. liblzma from XZ Utils 5.0.1 and older may incorrectly
    return LZMA_BUF_ERROR in some cases. See commit message
    d8db706acb8316f9861abd432cfbe001dd6d0c5c for the details.


2.2. Unsupported Files

    unsupported-check.xz uses Check ID 0x02 which isn't supported by
    the current version of the file format. It is implementation-defined
    how this file handled (it may reject it, or decode it possibly with
    a warning).

    unsupported-block_header.xz has a non-null byte in Header Padding,
    which may indicate presence of a new unsupported field.

    unsupported-filter_flags-1.xz has unsupported Filter ID 0x7F.

    unsupported-filter_flags-2.xz specifies only Delta filter in the
    List of Filter Flags, but Delta isn't allowed as the last filter in
    the chain. It could be a little more correct to detect this file as
    corrupt instead of unsupported, but saying it is unsupported is
    simpler in case of liblzma.

    unsupported-filter_flags-3.xz specifies two LZMA2 filters in the
    List of Filter Flags. LZMA2 is allowed only as the last filter in the
    chain. It could be a little more correct to detect this file as
    corrupt instead of unsupported, but saying it is unsupported is
    simpler in case of liblzma.


2.3. Bad Files

    bad-0pad-empty.xz has one Stream with no Blocks followed by
    five-byte Stream Padding. Stream Padding must be a multiple of four
    bytes, thus this file is corrupt.

    bad-0catpad-empty.xz has two zero-Block Streams concatenated with
    five-byte Stream Padding between the Streams.

    bad-0cat-alone.xz is good-0-empty.xz concatenated with an empty
    LZMA_Alone file.

    bad-0cat-header_magic.xz is good-0cat-empty.xz but with one byte
    wrong in the Header Magic Bytes field of the second Stream. liblzma
    gives LZMA_DATA_ERROR for this. (LZMA_FORMAT_ERROR is used only if
    the first Stream of a file has invalid Header Magic Bytes.)

    bad-0-header_magic.xz is good-0-empty.xz but with one byte wrong
    in the Header Magic Bytes field. liblzma gives LZMA_FORMAT_ERROR for
    this.

    bad-0-footer_magic.xz is good-0-empty.xz but with one byte wrong
    in the Footer Magic Bytes field. liblzma gives LZMA_DATA_ERROR for
    this.

    bad-0-empty-truncated.xz is good-0-empty.xz without the last byte
    of the file.

    bad-0-nonempty_index.xz has no Blocks but Index claims that there is
    one Block.

    bad-0-backward_size.xz has wrong Backward Size in Stream Footer.

    bad-1-stream_flags-1.xz has different Stream Flags in Stream Header
    and Stream Footer.

    bad-1-stream_flags-2.xz has wrong CRC32 in Stream Header.

    bad-1-stream_flags-3.xz has wrong CRC32 in Stream Footer.

    bad-1-vli-1.xz has two-byte variable-length integer in the
    Uncompressed Size field in Block Header while one-byte would be enough
    for that value. It's important that the file gets rejected due to too
    big integer encoding instead of due to Uncompressed Size not matching
    the value stored in the Block Header. That is, the decoder must not
    try to decode the Compressed Data field.

    bad-1-vli-2.xz has ten-byte variable-length integer as Uncompressed
    Size in Block Header. It's important that the file gets rejected due
    to too big integer encoding instead of due to Uncompressed Size not
    matching the value stored in the Block Header. That is, the decoder
    must not try to decode the Compressed Data field.

    bad-1-block_header-1.xz has Block Header that ends in the middle of
    the Filter Flags field.

    bad-1-block_header-2.xz has Block Header that has Compressed Size and
    Uncompressed Size but no List of Filter Flags field.

    bad-1-block_header-3.xz has wrong CRC32 in Block Header.

    bad-1-block_header-4.xz has too big Compressed Size in Block Header
    (2^63 - 1 bytes while maximum is a little less, because the whole
    Block must stay smaller than 2^63). It's important that the file
    gets rejected due to invalid Compressed Size value; the decoder
    must not try decoding the Compressed Data field.

    bad-1-block_header-5.xz has zero as Compressed Size in Block Header.

    bad-1-block_header-6.xz has corrupt Block Header which may crash
    xz -lvv in XZ Utils 5.0.3 and earlier. It was fixed in the commit
    c0297445064951807803457dca1611b3c47e7f0f.

    bad-2-index-1.xz has wrong Unpadded Sizes in Index.

    bad-2-index-2.xz has wrong Uncompressed Sizes in Index.

    bad-2-index-3.xz has non-null byte in Index Padding.

    bad-2-index-4.xz wrong CRC32 in Index.

    bad-2-index-5.xz has zero as Unpadded Size. It is important that the
    file gets rejected specifically due to Unpadded Size having an invalid
    value.

    bad-3-index-uncomp-overflow.xz has Index whose Uncompressed Size
    fields have huge values whose sum exceeds the maximum allowed size
    of 2^63 - 1 bytes. In this file the sum is exactly 2^64.
    lzma_index_append() in liblzma <= 5.2.6 lacks the integer overflow
    check for the uncompressed size and thus doesn't catch the error
    when decoding the Index field in this file. This makes "xz -l"
    not detect the error and will display 0 as the uncompressed size.
    Note that regular decompression isn't affected by this bug because
    it uses lzma_index_hash_append() instead.

    bad-2-compressed_data_padding.xz has non-null byte in the padding of
    the Compressed Data field of the first Block.

    bad-1-check-crc32.xz has wrong Check (CRC32).

    bad-1-check-crc32-2.xz has Compressed Size and Uncompressed Size in
    Block Header but wrong Check (CRC32) in the actual data. This file
    differs by one byte from good-1-block_header-1.xz: the last byte of
    the Check field is wrong. This file is useful for testing error
    detection in the threaded decoder when a worker thread is configured
    to pass input one byte at a time to the Block decoder.

    bad-1-check-crc64.xz has wrong Check (CRC64).

    bad-1-check-sha256.xz has wrong Check (SHA-256).

    bad-1-lzma2-1.xz has LZMA2 stream whose first chunk (uncompressed)
    doesn't reset the dictionary.

    bad-1-lzma2-2.xz has two LZMA2 chunks, of which the second chunk
    indicates dictionary reset, but the LZMA compressed data tries to
    repeat data from the previous chunk.

    bad-1-lzma2-3.xz sets new invalid properties (lc=8, lp=0, pb=0) in
    the middle of Block.

    bad-1-lzma2-4.xz has two LZMA2 chunks, of which the first is
    uncompressed and the second is LZMA. The first chunk resets dictionary
    as it should, but the second chunk tries to reset state without
    specifying properties for LZMA.

    bad-1-lzma2-5.xz is like bad-1-lzma2-4.xz but doesn't try to reset
    anything in the header of the second chunk.

    bad-1-lzma2-6.xz has reserved LZMA2 control byte value (0x03).

    bad-1-lzma2-7.xz has EOPM at LZMA level.

    bad-1-lzma2-8.xz is like good-1-lzma2-4.xz but doesn't set new
    properties in the third LZMA2 chunk.

    bad-1-lzma2-9.xz has LZMA2 stream that is truncated at the end of
    a LZMA2 chunk (no end marker). The uncompressed size of the partial
    LZMA2 stream exceeds the value stored in the Block Header.

    bad-1-lzma2-10.xz has LZMA2 stream that, from point of view of a
    LZMA2 decoder, extends past the end of Block (and even the end of
    the file). Uncompressed Size in Block Header is bigger than the
    invalid LZMA2 stream may produce (even if a decoder reads until
    the end of the file). The Check type is None to nullify certain
    simple size-based sanity checks in a Block decoder.

    bad-1-lzma2-11.xz has LZMA2 stream that lacks the end of
    payload marker. When Compressed Size bytes have been decoded,
    Uncompressed Size bytes of output will have been produced but
    the LZMA2 decoder doesn't indicate end of stream.


3. Descriptions of Individual .lzma Files

3.1. Good Files

    good-unknown_size-with_eopm.lzma has unknown size in the header
    and end of payload marker at the end.

    good-known_size-without_eopm.lzma has a known size in the header
    and no end of payload marker at the end.

    good-known_size-with_eopm.lzma has a known size in the header
    and end of payload marker at the end. XZ Utils 5.2.5 and older
    will give an error at the end of the file after producing the
    correct uncompressed output.


3.2. Bad Files

    bad-unknown_size-without_eopm.lzma has unknown size in the header
    but no end of payload marker at the end. This file might be seen
    by a decoder as if it were truncated.

    bad-too_big_size-with_eopm.lzma has too big uncompressed size in
    the header and the end of payload marker will be detected before
    the specified number of bytes have been decoded.

    bad-too_small_size-without_eopm-1.lzma has too small uncompressed
    size in the header. The decoder will look for end of payload marker
    but instead find a literal that would produce more output.

    bad-too_small_size-without_eopm-2.lzma is like -1 above but instead
    of a literal the problem occurs with a short repeated match.

    bad-too_small_size-without_eopm-3.lzma is like -1 above but instead
    of a literal the problem occurs in the middle of a match.