Difference between revisions of "ZIM file format"

Jump to navigation Jump to search
614 bytes added ,  20:53, 13 September 2010
no edit summary
Line 36: Line 36:
Each article in the ZIM file has a directory entry. Since the directory entry has a variable size we have an index pointerlist which is a list of 4-byte offsets. The pointers points to the directory entries.
Each article in the ZIM file has a directory entry. Since the directory entry has a variable size we have an index pointerlist which is a list of 4-byte offsets. The pointers points to the directory entries.


== Index pointer list ==
== Url pointer list (urlPtrPos) ==


The index pointer list is a list of 8 byte offsets to the directory entries. Since directory entries have variable sizes this is needed for random access.
The url pointer list is a list of 8 byte offsets to the directory entries.
The directory entries are always ordered by url. Ordering is simply done by comparing the url strings.
Since directory entries have variable sizes this is needed for random access.


The directory entries are always sorted by title. The title is encoded as [[QUnicode]], which is a custom utf-variant, which supports fast ordering.
== Title pointer list (titlePtrPos) ==


== Cluster pointer list ==
Tie title pointer list is a list of article indexes ordered by title. The title pointer list actually points to entries
in the url pointer list. Note that the title pointers are only 4 bytes. They are not offsets in the file but article numbers.
To get the offset of a article from the title pointer list, you have to look it up in the url pointer list.
 
== Cluster pointer list (clusterPtrPos) ==


The cluster pointer list is a list of 8 byte offsets which point to the data clusters.
The cluster pointer list is a list of 8 byte offsets which point to the data clusters.
== Mime list pointer (mimeListPos) ==
The mime list pointer if a file offset to a list of mime types. The mime types are zero terminated strings. A empty string
marks the end of the mime type list.


== Directory entries ==
== Directory entries ==


length in byte, all types are littlendian
length in byte, all data is little endian.
There are 2 types of directory entries: article entries and redirect entries. If the first two bytes are 0xffff the
directory entrie is a redirect.


=== article entry ===
=== article entry ===
Line 55: Line 68:
! Field Name !! Type !! Offset !! Length !! Description
! Field Name !! Type !! Offset !! Length !! Description
|-
|-
| redirectFlag || boolean || 0 || 1 || 0 for article
| mime || integer || 0 || 2 || mime type number - points to the mime type list
|-
|-
| mime || integer || 1 || 1 || mime type code
| parameter len || || 2 || 1 || length of extra paramters (which are currently unused an hence this is always 0)
|-
|-
| empty || || 2 || 1 || was compression flag, this is now in the cluster header
| namespace || char || 3 || 1 ||
|-
|-
| namespace || char || 3 || 1 ||
| version || integer || 4 || 4 ||
|-
|-
| cluster number || integer || 4 || 4 ||
| cluster number || integer || 8 || 4 ||
|-
|-
| blob number || integer || 8 || 4 ||
| blob number || integer || 12 || 4 ||
|-
|-
| extraLen          || integer || 12 || 2 || length of extra bytes (title and parameter)
| url || string || 16 || zero terminated || string with the url
|-
|-
| title + parameter separated by 0-byte || [[QUnicode]] (title) + custom (parameter, not used in articles) || 14 || specified by extraLen || actual title of article; when parameter is empty, the 0-byte is omitted
| title || string || || zero terminated || string with title or empty; in case it is empty, the url is used as title
|-
|-
| url || || || ||  
| parameter || data || || see extra len || extra parameters
|-
|-
|}
|}
Line 80: Line 93:
! Field Name !! Type !! Offset !! Length !! Description
! Field Name !! Type !! Offset !! Length !! Description
|-
|-
| redirectFlag || boolean || 0 || 1 || 1 for redirect
| mime || integer || 0 || 2 || 0xffff for redirect
|-
|-
| mime || integer || 1 || 1 || unused for redirects
| parameter len || || 2 || 1 || length of extra paramters (which are currently unused an hence this is always 0)
|-
|-
| empty || || 2 || 1 || was compression flag, this is now in the cluster header
| namespace || char || 3 || 1 ||
|-
|-
| namespace || char || 3 || 1 ||
| version || integer || 4 || 4 ||
|-
|-
| redirect index || integer || 4 || 4 ||
| redirect index || integer || 8 || 4 ||
|-
|-
| extraLen          || integer || 8 || 2 || length of extra bytes (title and parameter)
| url || string || 12 || zero terminated || string with the url
|-
|-
| title + parameter separated by 0-byte || [[QUnicode]] (title) + custom (parameter, not used in articles) || 10 || specified by extraLen || actual title of article; when parameter is empty, the 0-byte is omitted
| title || string || || zero terminated || string with title or empty; in case it is empty, the url is used as title
|-
|-
| url || || || ||  
| parameter || data || || see extra len || extra parameters
|-
|-
|}
|}

Navigation menu