Difference between revisions of "ZIM file format"

Jump to navigation Jump to search
 
(5 intermediate revisions by 3 users not shown)
Line 85: Line 85:
|From 3.2.0
|From 3.2.0
|-
|-
| rowspan="3" |6|| 0 || no  || Introduces extended clusters
| rowspan="4" |6|| 0 || no  || Introduces extended clusters
Still uses [[ZIM file format old namespace|"old" namespaces]]
Still uses [[ZIM file format old namespace|"old" namespaces]]
|From 3.2.0
|From 3.2.0
Line 94: Line 94:
|                    2 || yes || Explicitly allows alias entries (several entries pointing to the same cluster/blob)
|                    2 || yes || Explicitly allows alias entries (several entries pointing to the same cluster/blob)
|From 9.1.0
|From 9.1.0
|-
|                    3 || yes || Removes Title index "listing/titleOrdered/v0" (old zimcheck will complain)
|From 9.3.0
|}
|}


Line 337: Line 340:
All lengths are bytes.
All lengths are bytes.


== Split ZIM files ==
== Split ZIM archives in chunks ==
ZIM archives can be split in multiple files. This is necessary to be able to store big (over 4GB for example) ZIM archives to limited file systems (like FAT32). That said, the files can be of any size, but the naming is really important. The ZIM files should be named like following (the file name extensions matter): ''foobar.zimaa, foobar.zimab, foobar.zimac''...
ZIM archives can be split in multiple chunk files. This is necessary to be able to store big (over 4GB for example) ZIM archives in file systems with single file size restrictions (like FAT32). The size of each ZIM file chunk can be "choosen" but the ZIM archives can't be cut anywhere. ZIM archives have to be cut between [[#Clusters|cluster]]s. In addition, the naming of ZIM archive chunks is really important: they should be named like following (the file name extensions matter): ''foobar.zimaa, foobar.zimab, foobar.zimac''. To perform this splitting operation easily, you can rely on the ''zimsplit'' command line tool (part of the [https://github.com/openzim/zim-tools ZIM tools]).


== Other ==
== Other ==
`pathPtrPos` was initially called `urlPtrPos` (and `path` in dirent structure was called `url`). In April 2024, we have changed the wording from `url` to `path` as it better conveys that we are storing a path of the entry and not a full url (with scheme, host...). Note that is is just a wording change and the semantic has not changed at all. Implementation doesn't have to change anything (except a potential renaming of variables to better follow the spec).
<code>pathPtrPos</code> was initially called <code>urlPtrPos</code> (and <code>path</code> in dirent structure was called <code>url</code>). In April 2024, we changed the wording from <code>url</code> to <code>path</code> as it better conveys the fact that we are storing a path for the entry and not a full url (with scheme, host...). Note that this is just a wording change and the semantic has not changed at all. Implementations do not need to change anything (except a potential renaming of variables to better follow the spec).


== See also ==
== See also ==

Navigation menu