Difference between revisions of "Search indexes"
Mgautierfr (talk | contribs) (Add specification for the listing entries) |
Mgautierfr (talk | contribs) (Do not make `listing/titleOrdered/v1` relative to `C` namespace.) |
||
Line 59: | Line 59: | ||
Content size is <code>4 * <nbArticle></code> | Content size is <code>4 * <nbArticle></code> | ||
<code>listing/titleOrdered/v1</code> may be used to pick random articles or to search article by title and be sure that no resource entries are included. | <code>listing/titleOrdered/v1</code> may be used to pick random articles or to search article by title and be sure that no resource entries are included. |
Revision as of 08:39, 16 December 2020
Zim archives contain specific indexes in the X
namespace.
They are content that can be used by reader implementation to locate user entries.
All indexes are optional.
Xapian Indexes
Xapian indexes are xapian database. They have to be opened using the xapian library.
Fulltext index
Namespace: X
Path: fulltext/xapian
Mimetype : application/octet-stream+xapian
Tittle index
Namespace: X
Path: title/xapian
Mimetype : application/octet-stream+xapian
Listing
Listings are listing of entries.
The content of listing are binary array of entry numbers. Each entry number is 4 bytes (little-endian) unsigned integer. Entry number is the index of the entry in the URL pointer list.
To get the offset of an entry from the title pointer list, you have to look it up in the URL pointer list
Title index v0
Namespace: X
Path: listing/titleOrdered/v0
Mimetype : application/octet-stream+zimlisting
The content of the listing is the list of all entries in the zim archive (all namespace included, including X/listing/title/v0
itself).
Entries are sorted using the key <namespace><title>
Content size is 4 * <nbEntries>
This is the exact same content of the data titlePtrPos
.
If present, titlePtrPos
should directly point to the data of this entry.
Title index v1
Namespace: X
Path: listing/titleOrdered/v1
Mimetype : application/octet-stream+zimlisting
The content of the listing is the list of all article entries in the zim archive.
Entries are sorted using the key <title>
(All article entries are in C
namespace by definition)
Content size is 4 * <nbArticle>
listing/titleOrdered/v1
may be used to pick random articles or to search article by title and be sure that no resource entries are included.