LZMA2 compression

From openZIM
Revision as of 23:31, 1 January 2010 by Tntnet (talk | contribs) (Created page with 'LZMA compression is done with the xz-utils library. A problem with LZMA is, that with higher compression rates the memory needed for decompression increases. Using the highest r…')
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

LZMA compression is done with the xz-utils library.

A problem with LZMA is, that with higher compression rates the memory needed for decompression increases. Using the highest rate 9, 65 MB RAM is needed. On the Nanonote, which we want to support we have only 32 MB installed. Tests showed, that level 4 is too much, but 3 is ok. xz-utils has a additional extreme-flag, which justifies the lzma parameters so that the compression ratio of lzma level 3 is almost identical with bzip2. The big advantage is, that decompression of lzma is much faster (factor 3-4) than bzip2. The downside is, that creating zim files with lzma is much slower than bzip2 and the support for xz-utils is not yet that widespread.

Here are some test results:

Creating a file with 55498 index entries with 28936 articles.

bzip2:

   size: 90207329
   creating: 0:02:18
   reading random access: 29 #/s
   creating full text index: 00:03:01
   size of full text index: 92184996
   reading random access on Nanonote: 0.7 #/s

lzma:

   size: 90286916
   creating: 0:12:01
   reading random access: 120 #/s
   creating full text index: 00:03:03
   size of full text index: 87282408
   reading random access on Nanonote: 2.3 #/s

The tests (except the benchmark on the Nanonote) are done on our test machine - a dual core AMD 2,6GHz.