848
edits
Line 47: | Line 47: | ||
| ... || string || ... || ... || ... | | ... || string || ... || ... || ... | ||
|- | |- | ||
| <last entry / end> || string || n/a || 0 || empty string - zero terminated | | <last entry / end> || string || n/a || 0 || empty string - end of MIME type list - zero terminated | ||
|} | |} | ||
Line 147: | Line 147: | ||
! Field Name !! Type !!Offset!!Length!! Description | ! Field Name !! Type !!Offset!!Length!! Description | ||
|- | |- | ||
| <1st Cluster> || integer || 0 || 8 || | | <1st Cluster> || integer || 0 || 8 || pointer to the <1st Cluster> | ||
|- | |- | ||
| <1st Cluster> || integer || 8 || 8 || | | <1st Cluster> || integer || 8 || 8 || pointer to the <2nd Cluster> | ||
|- | |- | ||
| <nth Cluster> || integer ||(n-1)*8|| 8 || | | <nth Cluster> || integer ||(n-1)*8|| 8 || pointer to the <nth Cluster> | ||
|- | |- | ||
| ... || integer || ... || 8 || ... | | ... || integer || ... || 8 || ... | ||
Line 157: | Line 157: | ||
== Clusters == | == Clusters == | ||
The clusters contain the actual | The clusters contain the actual data of the directory entries. Clusters can be compressed or uncompressed. The purpose of the clusters are that data of more than one directory entry can be compressed inside one cluster, making the compression much more efficient. Typically clusters have a size of about 1 MB. | ||
The cluster | The first byte of the cluster identifies if it is compressed (4) or not (0). The default is uncompressed indicated by a value of 0 or 1 (obsoleted, inherited by Zeno) while compressed clusters are indicated by a value of 4 which indicates LZMA2 compression. There have been other compression algorithms used before (2: zlib, 3: bzip2) which have been removed. The zimlib uses [http://tukaani.org/xz/ xz-utils] as a C++ implementation of lzma2, for Java see [http://github.com/abartov/LZMA2-java LZMA2-java]. | ||
To find the data of a specific directory entry within a cluster the uncompressed cluster has a list of pointers to blobs within the uncompressed cluster after the first byte. | |||
The data | {|{{Prettytable}} | ||
! Field Name !! Type !!Offset!!Length!! Description | |||
|- | |||
| compression type || integer || 0 || 1 || 0: default (no compression), 1: none (inherited from Zeno), 4: LZMA2 compressed | |||
|- | |||
|colspan=5| The following data bytes have to be uncompressed! | |||
|- | |||
| <1st Blob> || integer || 1 || 4 || pointer to the <1st Blob> | |||
|- | |||
| <2nd Blob> || integer || 5 || 4 || pointer to the <2nd Blob> | |||
|- | |||
| <nth Blob> || integer ||(n-1)*4+1|| 4 || pointer to the <nth Blob> | |||
|- | |||
| ... || integer || ... || 4 || ... | |||
|- | |||
| <last blob / end> || integer || n/a || 4 || pointer to the end of the cluster | |||
|- | |||
| <1st Blob> || data || n/a || n/a || data of the <1st Blob> | |||
|- | |||
| <2nd Blob> || data || n/a || n/a || data of the <2nd Blob> | |||
|- | |||
| ... || data || ... || n/a || ... | |||
|} | |||
The offset addresses uncompressed data. The last pointer points to the end of the data area. So there is always one more offset than blobs. Since the first offset points to the start of the first data, the number of offsets can be determined by dividing this offset by 4. The size of one blob is calculated by the difference of two consecutive offsets. | |||
== Namespaces == | == Namespaces == | ||
Namespaces seperate different types of | Namespaces seperate different types of directory entries - which might have the same title - stored in the ZIM File Format. | ||
They can be distinguished by prepending the article namespace before the article name in the URL path, eg. ''http://localhost/A/Articlename''. | They can be distinguished by prepending the article namespace before the article name in the URL path, eg. ''http://localhost/A/Articlename''. |