diff options
author | Lasse Collin <lasse.collin@tukaani.org> | 2007-12-09 00:42:33 +0200 |
---|---|---|
committer | Lasse Collin <lasse.collin@tukaani.org> | 2007-12-09 00:42:33 +0200 |
commit | 5d018dc03549c1ee4958364712fb0c94e1bf2741 (patch) | |
tree | 1b211911fb33fddb3f04b77f99e81df23623ffc4 /doc/history.txt | |
download | xz-5d018dc03549c1ee4958364712fb0c94e1bf2741.tar.xz |
Imported to git.
Diffstat (limited to 'doc/history.txt')
-rw-r--r-- | doc/history.txt | 140 |
1 files changed, 140 insertions, 0 deletions
diff --git a/doc/history.txt b/doc/history.txt new file mode 100644 index 00000000..55293062 --- /dev/null +++ b/doc/history.txt @@ -0,0 +1,140 @@ + +LZMA Utils history +------------------ + +Tukaani distribution + + In 2005, there was a small group working on Tukaani distribution, which + was a Slackware fork. One of the project goals was to fit the distro on + a single 700 MiB ISO-9660 image. Using LZMA instead of gzip helped a + lot. Roughly speaking, one could fit data that took 1000 MiB in gzipped + form into 700 MiB with LZMA. Naturally compression ratio varied across + packages, but this was what we got on average. + + Slackware packages have traditionally had .tgz as the filename suffix, + which is an abbreviation of .tar.gz. A logical naming for LZMA + compressed packages was .tlz, being an abbreviation of .tar.lzma. + + At the end of the year 2007, there's no distribution under the Tukaani + project anymore. Development of LZMA Utils still continues. Still, + there are .tlz packages around, because at least Vector Linux (a + Slackware based distribution) uses LZMA for its packages. + + First versions of the modified pkgtools used the LZMA_Alone tool from + Igor Pavlov's LZMA SDK as is. It was fine, because users wouldn't need + to interact with LZMA_Alone directly. But people soon wanted to use + LZMA for other files too, and the interface of LZMA_Alone wasn't + comfortable for those used to gzip and bzip2. + + +First steps of LZMA Utils + + The first version of LZMA Utils (4.22.0) included a shell script called + lzmash. It was wrapper that had gzip-like command line interface. It + used the LZMA_Alone tool from LZMA SDK to do all the real work. zgrep, + zdiff, and related scripts from gzip were adapted work with LZMA and + were part of the first LZMA Utils release too. + + LZMA Utils 4.22.0 included also lzmadec, which was a small (less than + 10 KiB) decoder-only command line tool. It was written on top of the + decoder-only C code found from the LZMA SDK. lzmadec was convenient in + situations where LZMA_Alone (a few hundred KiB) would be too big. + + lzmash and lzmadec were written by Lasse Collin. + + +Second generation + + The lzmash script was an ugly and not very secure hack. The last + version of LZMA Utils to use lzmash was 4.27.1. + + LZMA Utils 4.32.0beta1 introduced a new lzma command line tool written + by Ville Koskinen. It was written in C++, and used the encoder and + decoder from C++ LZMA SDK with little modifications. This tool replaced + both the lzmash script and the LZMA_Alone command line tool in LZMA + Utils. + + Introducing this new tool caused some temporary incompatibilities, + because LZMA_Alone executable was simply named lzma like the new + command line tool, but they had completely different command line + interface. The file format was still the same. + + Lasse wrote liblzmadec, which was a small decoder-only library based on + the C code found from LZMA SDK. liblzmadec had API similar to zlib, + although there were some significant differences, which made it + non-trivial to use it in some applications designed for zlib and + libbzip2. + + The lzmadec command line tool was converted to use liblzmadec. + + Alexandre Sauvé helped converting build system to use GNU Autotools. + This made is easier to test for certain less portable features needed + by the new command line tool. + + Since the new command line tool never got completely finished (for + example, it didn't support LZMA_OPT environment variable), the intent + was to not call 4.32.x stable. Similarly, liblzmadec wasn't polished, + but appeared to work well enough, so some people started using it too. + + Because the development of the third generation of LZMA Utils was + delayed considerably (roughly two years), the 4.32.x branch had to be + kept maintained. It got some bug fixes now and then, and finally it was + decided to call it stable, although most of the missing features were + never added. + + +File format problems + + The file format used by LZMA_Alone was primitive. It was designed for + embedded systems in mind, and thus provided only minimal set of + features. The two biggest problems for non-embedded use were lack of + magic bytes and integrity check. + + Igor and Lasse started developing a new file format with some help from + Ville Koskinen, Mark Adler and Mikko Pouru. Designing the new format + took quite a long time. It was mostly because Lasse was quite slow at + getting things done due to personal reasons. + + Near the end of the year 2007 the new format was practically finished. + Compared to LZMA_Alone format and the .gz format used by gzip, the new + .lzma format is quite complex as a whole. This means that tools having + *full* support for the new format would be larger and more complex than + the tools supporting only the old LZMA_Alone format. + + For the situations where the full support for the .lzma format wouldn't + be required (embedded systems, operating system kernels), the new + format has a well-defined subset, which is easy to support with small + amount of code. It wouldn't be as small as an implementation using the + LZMA_Alone format, but the difference shouldn't be significant. + + The new .lzma format allows dividing the data in multiple independent + blocks, which can be compressed and uncompressed independenly. This + makes multi-threading possible with algorithms that aren't inherently + parallel (such as LZMA). There's also a central index of the sizes of + the blocks, which makes it possible to do limited random-access reading + with granularity of the block size. + + The new .lzma format uses the same filename suffix that was used for + LZMA_Alone files. The advantage is that users using the new tools won't + notice the change to the new format. The disadvantage is that the old + tools won't work with the new files. + + +Third generation + + LZMA Utils 4.42.0alphas drop the rest of the C++ LZMA SDK. The LZMA and + other included filters (algorithm implementations) are still directly + based on LZMA SDK, but ported to C. + + liblzma is now the core of LZMA Utils. It has zlib-like API, which + doesn't suffer from the problems of the API of liblzmadec. liblzma + supports not only LZMA, but several other filters, which together + can improve compression ratio even further with certain file types. + + The lzma and lzmadec command line tools have been rewritten. They uses + liblzma to do the actual compressing or uncompressing. + + The development of LZMA Utils 4.42.x is still in alpha stage. Several + features are still missing or don't fully work yet. Documentation is + also very minimal. + |