109
edits
m (→Header) |
|||
Line 36: | Line 36: | ||
Each article in the ZIM file has a directory entry. Since the directory entry has a variable size we have an index pointerlist which is a list of 4-byte offsets. The pointers points to the directory entries. | Each article in the ZIM file has a directory entry. Since the directory entry has a variable size we have an index pointerlist which is a list of 4-byte offsets. The pointers points to the directory entries. | ||
== | == Url pointer list (urlPtrPos) == | ||
The | The url pointer list is a list of 8 byte offsets to the directory entries. | ||
The directory entries are always ordered by url. Ordering is simply done by comparing the url strings. | |||
Since directory entries have variable sizes this is needed for random access. | |||
== Title pointer list (titlePtrPos) == | |||
== Cluster pointer list == | Tie title pointer list is a list of article indexes ordered by title. The title pointer list actually points to entries | ||
in the url pointer list. Note that the title pointers are only 4 bytes. They are not offsets in the file but article numbers. | |||
To get the offset of a article from the title pointer list, you have to look it up in the url pointer list. | |||
== Cluster pointer list (clusterPtrPos) == | |||
The cluster pointer list is a list of 8 byte offsets which point to the data clusters. | The cluster pointer list is a list of 8 byte offsets which point to the data clusters. | ||
== Mime list pointer (mimeListPos) == | |||
The mime list pointer if a file offset to a list of mime types. The mime types are zero terminated strings. A empty string | |||
marks the end of the mime type list. | |||
== Directory entries == | == Directory entries == | ||
length in byte, all types are | length in byte, all data is little endian. | ||
There are 2 types of directory entries: article entries and redirect entries. If the first two bytes are 0xffff the | |||
directory entrie is a redirect. | |||
=== article entry === | === article entry === | ||
Line 55: | Line 68: | ||
! Field Name !! Type !! Offset !! Length !! Description | ! Field Name !! Type !! Offset !! Length !! Description | ||
|- | |- | ||
| | | mime || integer || 0 || 2 || mime type number - points to the mime type list | ||
|- | |- | ||
| | | parameter len || || 2 || 1 || length of extra paramters (which are currently unused an hence this is always 0) | ||
|- | |- | ||
| | | namespace || char || 3 || 1 || | ||
|- | |- | ||
| | | version || integer || 4 || 4 || | ||
|- | |- | ||
| cluster number || integer || | | cluster number || integer || 8 || 4 || | ||
|- | |- | ||
| blob number || integer || | | blob number || integer || 12 || 4 || | ||
|- | |- | ||
| | | url || string || 16 || zero terminated || string with the url | ||
|- | |- | ||
| title | | title || string || || zero terminated || string with title or empty; in case it is empty, the url is used as title | ||
|- | |- | ||
| | | parameter || data || || see extra len || extra parameters | ||
|- | |- | ||
|} | |} | ||
Line 80: | Line 93: | ||
! Field Name !! Type !! Offset !! Length !! Description | ! Field Name !! Type !! Offset !! Length !! Description | ||
|- | |- | ||
| | | mime || integer || 0 || 2 || 0xffff for redirect | ||
|- | |- | ||
| | | parameter len || || 2 || 1 || length of extra paramters (which are currently unused an hence this is always 0) | ||
|- | |- | ||
| | | namespace || char || 3 || 1 || | ||
|- | |- | ||
| | | version || integer || 4 || 4 || | ||
|- | |- | ||
| redirect index || integer || | | redirect index || integer || 8 || 4 || | ||
|- | |- | ||
| | | url || string || 12 || zero terminated || string with the url | ||
|- | |- | ||
| title | | title || string || || zero terminated || string with title or empty; in case it is empty, the url is used as title | ||
|- | |- | ||
| | | parameter || data || || see extra len || extra parameters | ||
|- | |- | ||
|} | |} |