Difference between revisions of "ZIM file format"

ZIM file format (view source)

Revision as of 10:07, 15 March 2023

40 bytes removed , 10:07, 15 March 2023

m

Real zimlib -> libzim fix

Kelson

Bureaucrats, Administrators

519

edits

@@ Line 29: / Line 29: @@
 |-
 | titlePtrPos || integer || 40 || 8 || position of the directory pointerlist ordered by Title
-This is considered as obsolete, readers should use <code>X/listing/titleordered/v0</code> instead and fallback to <code>titlePtrPos</code> if entry is not present.
+This is considered as obsolete, readers should use <code>[[Search indexes#Title index v0|X/listing/titleordered/v0]]</code> instead and fallback to <code>titlePtrPos</code> if entry is not present.
-Always valid for now, but it may be set to 0 in the future if <code>titlePtrPos</code> is not present.
 |-
 | clusterPtrPos || integer || 48 || 8 || position of the cluster pointer list
@@ Line 75: / Line 73: @@
 The URL pointer list is a list of 8 byte offsets to the directory entries.
-The directory entries are always ordered by URL. Ordering is simply done by comparing the URL strings.
+The directory entries are always ordered by "full" URL (<code><namespace><path></code>). Ordering is simply done by comparing the URL strings.
 Since directory entries have variable sizes this is needed for random access.
@@ Line 91: / Line 89: @@
 |}
-Zimlib caches directory entries and references the cached entries via the URL pointers.
+Libzim caches directory entries and references the cached entries via the URL pointers.
 == Title Pointer List (titlePtrPos) ==
-The title pointer list is a list of entry indices ordered by title. The title pointer list actually points to entries in the URL pointer list.
+The title pointer list is a list of entry indices ordered by title (<code><namespace><title></code>). The title pointer list actually points to entries in the URL pointer list.
 Note that the title pointers are only 4 bytes. They are not offsets in the file but entry numbers.
@@ Line 114: / Line 112: @@
 The indirection from titles via URLs to directory entries has two reasons:
 * the pointer list is only half in size as 4 bytes are enough for each entry
-* accessing directory entries by title also makes use of cached directory entries which are referenced by the URL pointers, as implemented in zimlib.
+* accessing directory entries by title also makes use of cached directory entries which are referenced by the URL pointers, as implemented in libzim.
 == Directory Entries ==
@@ Line 190: / Line 188: @@
 The first byte of the cluster identifies some information about the cluster.
-The first fourth low bits identifies if the cluster is compressed (4) or not (0):
+The first fourth low bits identifies if the cluster compression type:
-* The default is uncompressed indicated by a value of 0 or 1 (obsoleted, inherited by Zeno).
+* No compression is indicated by a value of 1
-* Compressed clusters are indicated by a value of 4 ([[LZMA2 compression]] (or more precisely XZ, since there is a XZ header)) and 5 (Zstandard compression).
+* Compressed clusters are indicated by a value of 4 ([[LZMA2 compression]] (or more precisely XZ, since there is a XZ header)) or 5 (Zstandard compression).
-* There have been other compression algorithms used before (2: zlib, 3: bzip2) which have been removed.
+* There have been other compression algorithms used before  which have been removed: 2 for zlib and 3 for bzip2.
-The firth bit identifies if the cluster is extended or not :
+* 0 is an obselete code for no compression (inhereted from the Zeno)
+The fifth bit identifies the cluster is extended or not :
 * By default (5th bit == 0) the cluster is not extended. It means that the offsets are stored in a 4 bytes length integer. Thus contents stored in the cluster cannot exceed 4Go.
 * If the cluster is extended (5th bit == 1), the offsets are stored in 8 bytes length integer. Thus contents stored in the cluster can exceed 4Go.
 A cluster can be extended only if the zim major version is 6. Else (major version == 5) cluster will always be not extended.
-The zimlib uses [http://tukaani.org/xz/ xz-utils] as a C++ implementation of lzma2, for Java see [http://tukaani.org/xz/java.html XZ-Java].
+The libzim uses [http://tukaani.org/xz/ xz-utils] as a C++ implementation of lzma2, for Java see [http://tukaani.org/xz/java.html XZ-Java].
 To find the data of a specific directory entry within a cluster the uncompressed cluster has a list of pointers to blobs within the uncompressed cluster after the first byte.
@@ Line 206: / Line 206: @@
 ! Field Name !! Type !!Offset!!Length!! Description
 |-
-| cluster information || integer || 0 || 1 || Fourth low bits : 0: default (no compression), 1: none (inherited from Zeno), 4: LZMA2 compressed, 5: zstd compressed
+| cluster information || integer || 0 || 1 || Fourth low bits : 1: no compression, 4: LZMA2 compressed, 5: zstd compressed
 Firth bits : 0: normal (OFFSET_SIZE=4) 1: extended (OFFSET_SIZE=8)
 |-
@@ Line 269: / Line 269: @@
 If you use a common rendering engine or HTML widget you don't have to care for this cases, you can just use the requests as they are submitted by the engine / widget.
-Should you render the article contents by yourself you have to consider this and take care of it before you hand requests to zimlib.
+Should you render the article contents by yourself you have to consider this and take care of it before you hand requests to libzim.
 == Encodings ==

Difference between revisions of "ZIM file format"

ZIM file format (view source)

Revision as of 10:07, 15 March 2023

Navigation menu

Search