LZMA2 compression

From openZIM
Revision as of 16:05, 17 October 2010 by Kelson (talk | contribs)
Jump to navigation Jump to search

LZMA2 (de)compression is the standard and only one compression algorithm supported in ZIM. In the zimlib, this done with the xz-utils library.

A problem with LZMA2 is, that with higher compression rates the memory needed for decompression increases. Using the highest rate 9, 65 MB RAM is needed. On the Nanonote, which we want to support we have only 32 MB installed. Tests showed, that level 4 is too much, but 3 is ok. xz-utils has a additional extreme-flag, which justifies the lzma parameters so that the compression ratio of lzma level 3 is almost identical with bzip2 (deprecated). The big advantage is, that decompression of lzma is much faster (factor 3-4) than bzip2. The downside is, that creating zim files with lzma is much slower than bzip2 and the support for xz-utils is not yet that widespread.

Here are some test results:

Creating a file with 55498 index entries with 28936 articles.

bzip2 (deprecated):

   size: 90207329
   creating: 0:02:18
   reading random access: 29 #/s
   creating full text index: 00:03:01
   size of full text index: 92184996
   reading random access on Nanonote: 0.7 #/s

lzma:

   size: 90286916
   creating: 0:12:01
   reading random access: 120 #/s
   creating full text index: 00:03:03
   size of full text index: 87282408
   reading random access on Nanonote: 2.3 #/s

The tests (except the benchmark on the Nanonote) are done on our test machine - a dual core AMD 2,6GHz.