LZMA2 compression

From openZIM
Jump to: navigation, search

LZMA2 (de)compression is the standard and only one compression algorithm supported in ZIM. In the standart implementation this done with the xz-utils library.

A problem with LZMA2 is, that with higher compression rates the memory needed for decompression increases. Using the highest rate 9, 65 MB RAM is needed. On the Nanonote, which we want to support we have only 32 MB installed. Tests showed, that level 4 is too much, but 3 is ok. xz-utils has a additional extreme-flag, which justifies the lzma2 parameters so that the compression ratio of lzma2 level 3 is almost identical with bzip2 (deprecated). The big advantage is, that decompression of lzma2 is much faster (factor 3-4) than bzip2. The downside is, that creating zim files with lzma2 is much slower than bzip2 and the support for xz-utils is not yet that widespread.

Here are some test results:

Creating a file with 55498 index entries with 28936 articles.

bzip2 (deprecated)
   size: 90207329
   creating: 0:02:18
   reading random access: 29 #/s
   creating full text index: 00:03:01
   size of full text index: 92184996
   reading random access on Nanonote: 0.7 #/s
lzma2
   size: 90286916
   creating: 0:12:01
   reading random access: 120 #/s
   creating full text index: 00:03:03
   size of full text index: 87282408
   reading random access on Nanonote: 2.3 #/s

The tests (except the benchmark on the Nanonote) are done on our test machine - a dual core AMD 2,6GHz.