Changes

Jump to: navigation, search

ZIM file format

614 bytes added, 20:53, 13 September 2010
no edit summary
Each article in the ZIM file has a directory entry. Since the directory entry has a variable size we have an index pointerlist which is a list of 4-byte offsets. The pointers points to the directory entries.
== Index Url pointer list (urlPtrPos) ==
The index url pointer list is a list of 8 byte offsets to the directory entries. The directory entries are always ordered by url. Ordering is simply done by comparing the url strings.Since directory entries have variable sizes this is needed for random access.
The directory entries are always sorted by title. The title is encoded as [[QUnicode]], which is a custom utf-variant, which supports fast ordering.== Title pointer list (titlePtrPos) ==
Tie title pointer list is a list of article indexes ordered by title. The title pointer list actually points to entriesin the url pointer list. Note that the title pointers are only 4 bytes. They are not offsets in the file but article numbers.To get the offset of a article from the title pointer list, you have to look it up in the url pointer list. == Cluster pointer list (clusterPtrPos) ==
The cluster pointer list is a list of 8 byte offsets which point to the data clusters.
 
== Mime list pointer (mimeListPos) ==
 
The mime list pointer if a file offset to a list of mime types. The mime types are zero terminated strings. A empty string
marks the end of the mime type list.
== Directory entries ==
length in byte, all data is little endian.There are 2 types of directory entries: article entries and redirect entries. If the first two bytes are littlendian0xffff thedirectory entrie is a redirect.
=== article entry ===
! Field Name !! Type !! Offset !! Length !! Description
|-
| redirectFlag mime || boolean integer || 0 || 1 2 || 0 for articlemime type number - points to the mime type list
|-
| mime parameter len || integer || 1 2 || 1 || mime type codelength of extra paramters (which are currently unused an hence this is always 0)
|-
| empty namespace || char || 2 3 || 1 || was compression flag, this is now in the cluster header
|-
| namespace version || char integer || 3 4 || 1 4 ||
|-
| cluster number || integer || 4 8 || 4 ||
|-
| blob number || integer || 8 12 || 4 ||
|-
| extraLen url || integer string || 12 16 || 2 zero terminated || length of extra bytes (title and parameter)string with the url
|-
| title + parameter separated by 0-byte || [[QUnicode]] (title) + custom (parameter, not used in articles) string || 14 || specified by extraLen zero terminated || actual string with title of articleor empty; when parameter in case it is empty, the 0-byte url is omittedused as title
|-
| url parameter || data || || see extra len || extra parameters
|-
|}
! Field Name !! Type !! Offset !! Length !! Description
|-
| redirectFlag mime || boolean integer || 0 || 1 2 || 1 0xffff for redirect
|-
| mime parameter len || integer || 1 2 || 1 || length of extra paramters (which are currently unused for redirectsan hence this is always 0)
|-
| empty namespace || char || 2 3 || 1 || was compression flag, this is now in the cluster header
|-
| namespace version || char integer || 3 4 || 1 4 ||
|-
| redirect index || integer || 4 8 || 4 ||
|-
| extraLen url || integer string || 8 12 || 2 zero terminated || length of extra bytes (title and parameter)string with the url
|-
| title + parameter separated by 0-byte || [[QUnicode]] (title) + custom (parameter, not used in articles) string || 10 || specified by extraLen zero terminated || actual string with title of articleor empty; when parameter in case it is empty, the 0-byte url is omittedused as title
|-
| url parameter || data || || see extra len || extra parameters
|-
|}

Navigation menu