524
edits
Mgautierfr (talk | contribs) (Remove the idea that titlePtrPos may be set to zero.W) |
m (Real zimlib -> libzim fix) |
||
(2 intermediate revisions by the same user not shown) | |||
Line 89: | Line 89: | ||
|} | |} | ||
Libzim caches directory entries and references the cached entries via the URL pointers. | |||
== Title Pointer List (titlePtrPos) == | == Title Pointer List (titlePtrPos) == | ||
Line 112: | Line 112: | ||
The indirection from titles via URLs to directory entries has two reasons: | The indirection from titles via URLs to directory entries has two reasons: | ||
* the pointer list is only half in size as 4 bytes are enough for each entry | * the pointer list is only half in size as 4 bytes are enough for each entry | ||
* accessing directory entries by title also makes use of cached directory entries which are referenced by the URL pointers, as implemented in | * accessing directory entries by title also makes use of cached directory entries which are referenced by the URL pointers, as implemented in libzim. | ||
== Directory Entries == | == Directory Entries == | ||
Line 188: | Line 188: | ||
The first byte of the cluster identifies some information about the cluster. | The first byte of the cluster identifies some information about the cluster. | ||
The first fourth low bits identifies if the cluster | The first fourth low bits identifies if the cluster compression type: | ||
* | * No compression is indicated by a value of 1 | ||
* Compressed clusters are indicated by a value of 4 ([[LZMA2 compression]] (or more precisely XZ, since there is a XZ header)) | * Compressed clusters are indicated by a value of 4 ([[LZMA2 compression]] (or more precisely XZ, since there is a XZ header)) or 5 (Zstandard compression). | ||
* There have been other compression algorithms used before | * There have been other compression algorithms used before which have been removed: 2 for zlib and 3 for bzip2. | ||
The | * 0 is an obselete code for no compression (inhereted from the Zeno) | ||
The fifth bit identifies the cluster is extended or not : | |||
* By default (5th bit == 0) the cluster is not extended. It means that the offsets are stored in a 4 bytes length integer. Thus contents stored in the cluster cannot exceed 4Go. | * By default (5th bit == 0) the cluster is not extended. It means that the offsets are stored in a 4 bytes length integer. Thus contents stored in the cluster cannot exceed 4Go. | ||
* If the cluster is extended (5th bit == 1), the offsets are stored in 8 bytes length integer. Thus contents stored in the cluster can exceed 4Go. | * If the cluster is extended (5th bit == 1), the offsets are stored in 8 bytes length integer. Thus contents stored in the cluster can exceed 4Go. | ||
A cluster can be extended only if the zim major version is 6. Else (major version == 5) cluster will always be not extended. | A cluster can be extended only if the zim major version is 6. Else (major version == 5) cluster will always be not extended. | ||
The | The libzim uses [http://tukaani.org/xz/ xz-utils] as a C++ implementation of lzma2, for Java see [http://tukaani.org/xz/java.html XZ-Java]. | ||
To find the data of a specific directory entry within a cluster the uncompressed cluster has a list of pointers to blobs within the uncompressed cluster after the first byte. | To find the data of a specific directory entry within a cluster the uncompressed cluster has a list of pointers to blobs within the uncompressed cluster after the first byte. | ||
Line 204: | Line 206: | ||
! Field Name !! Type !!Offset!!Length!! Description | ! Field Name !! Type !!Offset!!Length!! Description | ||
|- | |- | ||
| cluster information || integer || 0 || 1 || Fourth low bits : | | cluster information || integer || 0 || 1 || Fourth low bits : 1: no compression, 4: LZMA2 compressed, 5: zstd compressed | ||
Firth bits : 0: normal (OFFSET_SIZE=4) 1: extended (OFFSET_SIZE=8) | Firth bits : 0: normal (OFFSET_SIZE=4) 1: extended (OFFSET_SIZE=8) | ||
|- | |- | ||
Line 267: | Line 269: | ||
If you use a common rendering engine or HTML widget you don't have to care for this cases, you can just use the requests as they are submitted by the engine / widget. | If you use a common rendering engine or HTML widget you don't have to care for this cases, you can just use the requests as they are submitted by the engine / widget. | ||
Should you render the article contents by yourself you have to consider this and take care of it before you hand requests to | Should you render the article contents by yourself you have to consider this and take care of it before you hand requests to libzim. | ||
== Encodings == | == Encodings == |