Difference between revisions of "ZIM file format"

Jump to navigation Jump to search
236 bytes added ,  13:01, 20 October 2009
Line 94: Line 94:
The clusters contain the actual article data. This file section contain a list of clusters, which contain a list of blobs each. The blob is the data of one specific article. So this blob is adressed by the cluster number and the blob number in this cluster. The cluster number is used to look up the file offset in the cluster pointer list.
The clusters contain the actual article data. This file section contain a list of clusters, which contain a list of blobs each. The blob is the data of one specific article. So this blob is adressed by the cluster number and the blob number in this cluster. The cluster number is used to look up the file offset in the cluster pointer list.


The cluster has a starting byte, which indicated, which compresion is used. After this byte, all other data is compressed.
The cluster has a starting byte, which indicated, which compresion is used. After this byte, all other data is compressed. Possible values are:
* 0 default (no compression)
* 1 none also no compression (inherited from zeno)
* 2 zip (zlib)
* 3 bzip2 (currently used in writer)
* 4 lzma (not implemented in reader or writer due to lack of compression library)


The data area has a list of 4 byte offsets to the blobs counting from the first offset. The offset addresses uncompressed data. The last pointer points to the end of the data area. So there is always one more offset than blobs. Since the first offset points to the start of the first data, the number of offsets can be determined by dividing this offset by 4. The size of one blob is calculated by the difference of two consecutive offsets.
The data area has a list of 4 byte offsets to the blobs counting from the first offset. The offset addresses uncompressed data. The last pointer points to the end of the data area. So there is always one more offset than blobs. Since the first offset points to the start of the first data, the number of offsets can be determined by dividing this offset by 4. The size of one blob is calculated by the difference of two consecutive offsets.

Navigation menu