Difference between revisions of "ZIM file format"

Jump to navigation Jump to search
14 bytes removed ,  15:30, 4 April 2020
→‎Clusters: Add 5 for Zstandard compression
(→‎Clusters: Add 5 for Zstandard compression)
(3 intermediate revisions by 2 users not shown)
Line 13: Line 13:
! Field Name !! Type !! Offset !! Length !! Description                 
! Field Name !! Type !! Offset !! Length !! Description                 
|-
|-
| magicNumber || integer || 0 || 4 || Magic number to recognise the file format, must be 0x72173914                   
| magicNumber || integer || 0 || 4 || Magic number to recognise the file format, must be 72173914 (0x44D495A)
|-
|-
|majorVersion
|majorVersion
Line 199: Line 199:
The first byte of the cluster identifies some information about the cluster.
The first byte of the cluster identifies some information about the cluster.


The first fourth low bits identifies if the cluster is compressed (4) or not (0). The default is uncompressed indicated by a value of 0 or 1 (obsoleted, inherited by Zeno) while compressed clusters are indicated by a value of 4 which indicates [[LZMA2 compression]] (or more precisely XZ, since there is a XZ header). There have been other compression algorithms used before (2: zlib, 3: bzip2) which have been removed. The zimlib uses [http://tukaani.org/xz/ xz-utils] as a C++ implementation of lzma2, for Java see [http://tukaani.org/xz/java.html XZ-Java].
The first fourth low bits identifies if the cluster is compressed (4) or not (0). The default is uncompressed indicated by a value of 0 or 1 (obsoleted, inherited by Zeno) while compressed clusters are indicated by a value of 4 which indicates [[LZMA2 compression]] (or more precisely XZ, since there is a XZ header) and 5 the Zstandard compression. There have been other compression algorithms used before (2: zlib, 3: bzip2) which have been removed. The zimlib uses [http://tukaani.org/xz/ xz-utils] as a C++ implementation of lzma2, for Java see [http://tukaani.org/xz/java.html XZ-Java].


The firth bit identifies if the cluster is extended or not :
The firth bit identifies if the cluster is extended or not :
Line 261: Line 261:
| W || categories per article, category list - see [[Category Handling]]         
| W || categories per article, category list - see [[Category Handling]]         
|-
|-
| X || fulltext index - see [[ZIM Index Format]]       
| X || search indexes
|}
|}


Navigation menu