1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
|
.xz and .lzma Test Files
------------------------
0. Introduction
This directory contains bunch of files to test handling of .xz
and .lzma files in decoder implementations. Many of the files have
been created by hand with a hex editor, thus there is no better
"source code" than the files themselves. All the test files and
this README have been put into the public domain.
1. File Types
Good files (good-*.xz, good-*.lzma) must decode successfully
without requiring a lot of CPU time or RAM.
Unsupported files (unsupported-*.xz) are good files, but headers
indicate features not supported by the current file format
specification.
Bad files (bad-*.xz, bad-*.lzma) must cause the decoder to give
an error. Like with the good files, these files must not require
a lot of CPU time or RAM before they get detected to be broken.
2. Descriptions of Individual .xz Files
2.1. Good Files
good-0-empty.xz has one Stream with no Blocks.
good-0pad-empty.xz has one Stream with no Blocks followed by
four-byte Stream Padding.
good-0cat-empty.xz has two zero-Block Streams concatenated without
Stream Padding.
good-0catpad-empty.xz has two zero-Block Streams concatenated with
four-byte Stream Padding between the Streams.
good-1-check-none.xz has one Stream with one Block with two
uncompressed LZMA2 chunks and no integrity check.
good-1-check-crc32.xz has one Stream with one Block with two
uncompressed LZMA2 chunks and CRC32 check.
good-1-check-crc64.xz is like good-1-check-crc32.xz but with CRC64.
good-1-check-sha256.xz is like good-1-check-crc32.xz but with
SHA256.
good-2-lzma2.xz has one Stream with two Blocks with one uncompressed
LZMA2 chunk in each Block.
good-1-block_header-1.xz has both Compressed Size and Uncompressed
Size in the Block Header. This has also four extra bytes of Header
Padding.
good-1-block_header-2.xz has known Compressed Size.
good-1-block_header-3.xz has known Uncompressed Size.
good-1-delta-lzma2.tiff.xz is an image file that compresses
better with Delta+LZMA2 than with plain LZMA2.
good-1-x86-lzma2.xz uses the x86 filter (BCJ) and LZMA2. The
uncompressed file is compress_prepared_bcj_x86 found from the tests
directory.
good-1-sparc-lzma2.xz uses the SPARC filter and LZMA2. The
uncompressed file is compress_prepared_bcj_sparc found from the tests
directory.
good-1-lzma2-1.xz has two LZMA2 chunks, of which the second sets
new properties.
good-1-lzma2-2.xz has two LZMA2 chunks, of which the second resets
the state without specifying new properties.
good-1-lzma2-3.xz has two LZMA2 chunks, of which the first is
uncompressed and the second is LZMA. The first chunk resets dictionary
and the second sets new properties.
good-1-lzma2-4.xz has three LZMA2 chunks: First is LZMA, second is
uncompressed with dictionary reset, and third is LZMA with new
properties but without dictionary reset.
good-1-lzma2-5.xz has an empty LZMA2 stream with only the end of
payload marker. XZ Utils 5.0.1 and older incorrectly see this file
as corrupt.
good-1-3delta-lzma2.xz has three Delta filters and LZMA2.
good-1-empty-bcj-lzma2.xz has an empty Block that uses PowerPC BCJ
and LZMA2. liblzma from XZ Utils 5.0.1 and older may incorrectly
return LZMA_BUF_ERROR in some cases. See commit message
d8db706acb8316f9861abd432cfbe001dd6d0c5c for the details.
2.2. Unsupported Files
unsupported-check.xz uses Check ID 0x02 which isn't supported by
the current version of the file format. It is implementation-defined
how this file handled (it may reject it, or decode it possibly with
a warning).
unsupported-block_header.xz has a non-null byte in Header Padding,
which may indicate presence of a new unsupported field.
unsupported-filter_flags-1.xz has unsupported Filter ID 0x7F.
unsupported-filter_flags-2.xz specifies only Delta filter in the
List of Filter Flags, but Delta isn't allowed as the last filter in
the chain. It could be a little more correct to detect this file as
corrupt instead of unsupported, but saying it is unsupported is
simpler in case of liblzma.
unsupported-filter_flags-3.xz specifies two LZMA2 filters in the
List of Filter Flags. LZMA2 is allowed only as the last filter in the
chain. It could be a little more correct to detect this file as
corrupt instead of unsupported, but saying it is unsupported is
simpler in case of liblzma.
2.3. Bad Files
bad-0pad-empty.xz has one Stream with no Blocks followed by
five-byte Stream Padding. Stream Padding must be a multiple of four
bytes, thus this file is corrupt.
bad-0catpad-empty.xz has two zero-Block Streams concatenated with
five-byte Stream Padding between the Streams.
bad-0cat-alone.xz is good-0-empty.xz concatenated with an empty
LZMA_Alone file.
bad-0cat-header_magic.xz is good-0cat-empty.xz but with one byte
wrong in the Header Magic Bytes field of the second Stream. liblzma
gives LZMA_DATA_ERROR for this. (LZMA_FORMAT_ERROR is used only if
the first Stream of a file has invalid Header Magic Bytes.)
bad-0-header_magic.xz is good-0-empty.xz but with one byte wrong
in the Header Magic Bytes field. liblzma gives LZMA_FORMAT_ERROR for
this.
bad-0-footer_magic.xz is good-0-empty.xz but with one byte wrong
in the Footer Magic Bytes field. liblzma gives LZMA_DATA_ERROR for
this.
bad-0-empty-truncated.xz is good-0-empty.xz without the last byte
of the file.
bad-0-nonempty_index.xz has no Blocks but Index claims that there is
one Block.
bad-0-backward_size.xz has wrong Backward Size in Stream Footer.
bad-1-stream_flags-1.xz has different Stream Flags in Stream Header
and Stream Footer.
bad-1-stream_flags-2.xz has wrong CRC32 in Stream Header.
bad-1-stream_flags-3.xz has wrong CRC32 in Stream Footer.
bad-1-vli-1.xz has two-byte variable-length integer in the
Uncompressed Size field in Block Header while one-byte would be enough
for that value. It's important that the file gets rejected due to too
big integer encoding instead of due to Uncompressed Size not matching
the value stored in the Block Header. That is, the decoder must not
try to decode the Compressed Data field.
bad-1-vli-2.xz has ten-byte variable-length integer as Uncompressed
Size in Block Header. It's important that the file gets rejected due
to too big integer encoding instead of due to Uncompressed Size not
matching the value stored in the Block Header. That is, the decoder
must not try to decode the Compressed Data field.
bad-1-block_header-1.xz has Block Header that ends in the middle of
the Filter Flags field.
bad-1-block_header-2.xz has Block Header that has Compressed Size and
Uncompressed Size but no List of Filter Flags field.
bad-1-block_header-3.xz has wrong CRC32 in Block Header.
bad-1-block_header-4.xz has too big Compressed Size in Block Header
(2^63 - 1 bytes while maximum is a little less, because the whole
Block must stay smaller than 2^63). It's important that the file
gets rejected due to invalid Compressed Size value; the decoder
must not try decoding the Compressed Data field.
bad-1-block_header-5.xz has zero as Compressed Size in Block Header.
bad-1-block_header-6.xz has corrupt Block Header which may crash
xz -lvv in XZ Utils 5.0.3 and earlier. It was fixed in the commit
c0297445064951807803457dca1611b3c47e7f0f.
bad-2-index-1.xz has wrong Unpadded Sizes in Index.
bad-2-index-2.xz has wrong Uncompressed Sizes in Index.
bad-2-index-3.xz has non-null byte in Index Padding.
bad-2-index-4.xz wrong CRC32 in Index.
bad-2-index-5.xz has zero as Unpadded Size. It is important that the
file gets rejected specifically due to Unpadded Size having an invalid
value.
bad-3-index-uncomp-overflow.xz has Index whose Uncompressed Size
fields have huge values whose sum exceeds the maximum allowed size
of 2^63 - 1 bytes. In this file the sum is exactly 2^64.
lzma_index_append() in liblzma <= 5.2.6 lacks the integer overflow
check for the uncompressed size and thus doesn't catch the error
when decoding the Index field in this file. This makes "xz -l"
not detect the error and will display 0 as the uncompressed size.
Note that regular decompression isn't affected by this bug because
it uses lzma_index_hash_append() instead.
bad-2-compressed_data_padding.xz has non-null byte in the padding of
the Compressed Data field of the first Block.
bad-1-check-crc32.xz has wrong Check (CRC32).
bad-1-check-crc32-2.xz has Compressed Size and Uncompressed Size in
Block Header but wrong Check (CRC32) in the actual data. This file
differs by one byte from good-1-block_header-1.xz: the last byte of
the Check field is wrong. This file is useful for testing error
detection in the threaded decoder when a worker thread is configured
to pass input one byte at a time to the Block decoder.
bad-1-check-crc64.xz has wrong Check (CRC64).
bad-1-check-sha256.xz has wrong Check (SHA-256).
bad-1-lzma2-1.xz has LZMA2 stream whose first chunk (uncompressed)
doesn't reset the dictionary.
bad-1-lzma2-2.xz has two LZMA2 chunks, of which the second chunk
indicates dictionary reset, but the LZMA compressed data tries to
repeat data from the previous chunk.
bad-1-lzma2-3.xz sets new invalid properties (lc=8, lp=0, pb=0) in
the middle of Block.
bad-1-lzma2-4.xz has two LZMA2 chunks, of which the first is
uncompressed and the second is LZMA. The first chunk resets dictionary
as it should, but the second chunk tries to reset state without
specifying properties for LZMA.
bad-1-lzma2-5.xz is like bad-1-lzma2-4.xz but doesn't try to reset
anything in the header of the second chunk.
bad-1-lzma2-6.xz has reserved LZMA2 control byte value (0x03).
bad-1-lzma2-7.xz has EOPM at LZMA level.
bad-1-lzma2-8.xz is like good-1-lzma2-4.xz but doesn't set new
properties in the third LZMA2 chunk.
bad-1-lzma2-9.xz has LZMA2 stream that is truncated at the end of
a LZMA2 chunk (no end marker). The uncompressed size of the partial
LZMA2 stream exceeds the value stored in the Block Header.
bad-1-lzma2-10.xz has LZMA2 stream that, from point of view of a
LZMA2 decoder, extends past the end of Block (and even the end of
the file). Uncompressed Size in Block Header is bigger than the
invalid LZMA2 stream may produce (even if a decoder reads until
the end of the file). The Check type is None to nullify certain
simple size-based sanity checks in a Block decoder.
bad-1-lzma2-11.xz has LZMA2 stream that lacks the end of
payload marker. When Compressed Size bytes have been decoded,
Uncompressed Size bytes of output will have been produced but
the LZMA2 decoder doesn't indicate end of stream.
3. Descriptions of Individual .lzma Files
3.1. Good Files
good-unknown_size-with_eopm.lzma has unknown size in the header
and end of payload marker at the end.
good-known_size-without_eopm.lzma has a known size in the header
and no end of payload marker at the end.
good-known_size-with_eopm.lzma has a known size in the header
and end of payload marker at the end. XZ Utils 5.2.5 and older
will give an error at the end of the file after producing the
correct uncompressed output.
3.2. Bad Files
bad-unknown_size-without_eopm.lzma has unknown size in the header
but no end of payload marker at the end. This file might be seen
by a decoder as if it were truncated.
bad-too_big_size-with_eopm.lzma has too big uncompressed size in
the header and the end of payload marker will be detected before
the specified number of bytes have been decoded.
bad-too_small_size-without_eopm-1.lzma has too small uncompressed
size in the header. The decoder will look for end of payload marker
but instead find a literal that would produce more output.
bad-too_small_size-without_eopm-2.lzma is like -1 above but instead
of a literal the problem occurs with a short repeated match.
bad-too_small_size-without_eopm-3.lzma is like -1 above but instead
of a literal the problem occurs in the middle of a match.
|