Difference between revisions of "ZIM file format"

Jump to navigation Jump to search
8 bytes added ,  12:22, 6 May 2011
no edit summary
Line 159: Line 159:
The clusters contain the actual data of the directory entries. Clusters can be compressed or uncompressed. The purpose of the clusters are that data of more than one directory entry can be compressed inside one cluster, making the compression much more efficient. Typically clusters have a size of about 1 MB.
The clusters contain the actual data of the directory entries. Clusters can be compressed or uncompressed. The purpose of the clusters are that data of more than one directory entry can be compressed inside one cluster, making the compression much more efficient. Typically clusters have a size of about 1 MB.


The first byte of the cluster identifies if it is compressed (4) or not (0). The default is uncompressed indicated by a value of 0 or 1 (obsoleted, inherited by Zeno) while compressed clusters are indicated by a value of 4 which indicates LZMA2 compression. There have been other compression algorithms used before (2: zlib, 3: bzip2) which have been removed. The zimlib uses [http://tukaani.org/xz/ xz-utils] as a C++ implementation of lzma2, for Java see [http://tukaani.org/xz/java.html].
The first byte of the cluster identifies if it is compressed (4) or not (0). The default is uncompressed indicated by a value of 0 or 1 (obsoleted, inherited by Zeno) while compressed clusters are indicated by a value of 4 which indicates LZMA2 compression. There have been other compression algorithms used before (2: zlib, 3: bzip2) which have been removed. The zimlib uses [http://tukaani.org/xz/ xz-utils] as a C++ implementation of lzma2, for Java see [http://tukaani.org/xz/java.html XZ-Java].


To find the data of a specific directory entry within a cluster the uncompressed cluster has a list of pointers to blobs within the uncompressed cluster after the first byte.
To find the data of a specific directory entry within a cluster the uncompressed cluster has a list of pointers to blobs within the uncompressed cluster after the first byte.

Navigation menu