Difference between revisions of "ZIM file format"

Jump to navigation Jump to search
148 bytes removed ,  07:44, 23 April 2021
→‎Clusters: Clarification around the cluster compression types
m
(→‎Clusters: Clarification around the cluster compression types)
(One intermediate revision by one other user not shown)
Line 30: Line 30:
| titlePtrPos || integer || 40 || 8 || position of the directory pointerlist ordered by Title
| titlePtrPos || integer || 40 || 8 || position of the directory pointerlist ordered by Title
This is considered as obsolete, readers should use <code>[[Search indexes#Title index v0|X/listing/titleordered/v0]]</code> instead and fallback to <code>titlePtrPos</code> if entry is not present.
This is considered as obsolete, readers should use <code>[[Search indexes#Title index v0|X/listing/titleordered/v0]]</code> instead and fallback to <code>titlePtrPos</code> if entry is not present.
Always valid for now, but it may be set to 0 in the future if <code>titlePtrPos</code> is not present.                 
|-
|-
| clusterPtrPos || integer || 48 || 8 || position of the cluster pointer list                 
| clusterPtrPos || integer || 48 || 8 || position of the cluster pointer list                 
Line 190: Line 188:
The first byte of the cluster identifies some information about the cluster.
The first byte of the cluster identifies some information about the cluster.


The first fourth low bits identifies if the cluster is compressed (4) or not (0):
The first fourth low bits identifies if the cluster compression type:
* The default is uncompressed indicated by a value of 0 or 1 (obsoleted, inherited by Zeno).
* No compression is indicated by a value of 1
* Compressed clusters are indicated by a value of 4 ([[LZMA2 compression]] (or more precisely XZ, since there is a XZ header)) and 5 (Zstandard compression).
* Compressed clusters are indicated by a value of 4 ([[LZMA2 compression]] (or more precisely XZ, since there is a XZ header)) or 5 (Zstandard compression).
* There have been other compression algorithms used before (2: zlib, 3: bzip2) which have been removed.
* There have been other compression algorithms used before which have been removed: 2 for zlib and 3 for bzip2.
The firth bit identifies if the cluster is extended or not :
* 0 is an obselete code for no compression (inhereted from the Zeno)
 
The fifth bit identifies the cluster is extended or not :
* By default (5th bit == 0) the cluster is not extended. It means that the offsets are stored in a 4 bytes length integer. Thus contents stored in the cluster cannot exceed 4Go.
* By default (5th bit == 0) the cluster is not extended. It means that the offsets are stored in a 4 bytes length integer. Thus contents stored in the cluster cannot exceed 4Go.
* If the cluster is extended (5th bit == 1), the offsets are stored in 8 bytes length integer. Thus contents stored in the cluster can exceed 4Go.
* If the cluster is extended (5th bit == 1), the offsets are stored in 8 bytes length integer. Thus contents stored in the cluster can exceed 4Go.
Line 206: Line 206:
! Field Name !! Type !!Offset!!Length!! Description                 
! Field Name !! Type !!Offset!!Length!! Description                 
|-
|-
| cluster information || integer || 0 || 1 || Fourth low bits : 0: default (no compression), 1: none (inherited from Zeno), 4: LZMA2 compressed, 5: zstd compressed
| cluster information || integer || 0 || 1 || Fourth low bits : 1: no compression, 4: LZMA2 compressed, 5: zstd compressed
Firth bits : 0: normal (OFFSET_SIZE=4) 1: extended (OFFSET_SIZE=8)               
Firth bits : 0: normal (OFFSET_SIZE=4) 1: extended (OFFSET_SIZE=8)               
|-
|-

Navigation menu