519
edits
(→Clusters: Add 5 for Zstandard compression) |
|||
(One intermediate revision by the same user not shown) | |||
Line 199: | Line 199: | ||
The first byte of the cluster identifies some information about the cluster. | The first byte of the cluster identifies some information about the cluster. | ||
The first fourth low bits identifies if the cluster is compressed (4) or not (0). The default is uncompressed indicated by a value of 0 or 1 (obsoleted, inherited by Zeno) while compressed clusters are indicated by a value of 4 which indicates [[LZMA2 compression]] (or more precisely XZ, since there is a XZ header). There have been other compression algorithms used before (2: zlib, 3: bzip2) which have been removed. The zimlib uses [http://tukaani.org/xz/ xz-utils] as a C++ implementation of lzma2, for Java see [http://tukaani.org/xz/java.html XZ-Java]. | The first fourth low bits identifies if the cluster is compressed (4) or not (0). The default is uncompressed indicated by a value of 0 or 1 (obsoleted, inherited by Zeno) while compressed clusters are indicated by a value of 4 which indicates [[LZMA2 compression]] (or more precisely XZ, since there is a XZ header) and 5 the Zstandard compression. There have been other compression algorithms used before (2: zlib, 3: bzip2) which have been removed. The zimlib uses [http://tukaani.org/xz/ xz-utils] as a C++ implementation of lzma2, for Java see [http://tukaani.org/xz/java.html XZ-Java]. | ||
The firth bit identifies if the cluster is extended or not : | The firth bit identifies if the cluster is extended or not : | ||
Line 261: | Line 261: | ||
| W || categories per article, category list - see [[Category Handling]] | | W || categories per article, category list - see [[Category Handling]] | ||
|- | |- | ||
| X || | | X || search indexes | ||
|} | |} | ||