Difference between revisions of "Search indexes"

From openZIM
Jump to navigation Jump to search
(Add specification for the listing entries)
 
 
(2 intermediate revisions by the same user not shown)
Line 4: Line 4:


All indexes are optional.
All indexes are optional.
All indexes and listing items MUST be stored in uncompressed cluster.


== Xapian Indexes ==
== Xapian Indexes ==
Line 54: Line 56:
'''Mimetype''' : <code>application/octet-stream+zimlisting</code>
'''Mimetype''' : <code>application/octet-stream+zimlisting</code>


The content of the listing is the list of all article entries in the zim archive.
The content of the listing is the list of all "article entries" in the zim archive.
 
Those "article entries" may be redirect (what is a article entry is not really defined in the spec, it is up to the creator to define in which category a entry is).


Entries are sorted using the key <code><title></code> (All article entries are in <code>C</code> namespace by definition)
Entries are sorted using the key <code><title></code> (All article entries are in <code>C</code> namespace by definition)


Content size is <code>4 * <nbArticle></code>
Content size is <code>4 * <nbArticle></code>
Entry numbers are relatives to the beginning of the <code>C</code> namespace. For now, as <code>C</code> namespace is the first namespace, there is no article before the <code>C</code> namespace the article number are also absolute. It may be not always the case if we add other namespace in the future.


<code>listing/titleOrdered/v1</code> may be used to pick random articles or to search article by title and be sure that no resource entries are included.
<code>listing/titleOrdered/v1</code> may be used to pick random articles or to search article by title and be sure that no resource entries are included.

Latest revision as of 13:54, 28 April 2021

Zim archives contain specific indexes in the X namespace.

They are content that can be used by reader implementation to locate user entries.

All indexes are optional.

All indexes and listing items MUST be stored in uncompressed cluster.

Xapian Indexes

Xapian indexes are xapian database. They have to be opened using the xapian library.

Fulltext index

Namespace: X

Path: fulltext/xapian

Mimetype : application/octet-stream+xapian

Tittle index

Namespace: X

Path: title/xapian

Mimetype : application/octet-stream+xapian

Listing

Listings are listing of entries.

The content of listing are binary array of entry numbers. Each entry number is 4 bytes (little-endian) unsigned integer. Entry number is the index of the entry in the URL pointer list.

To get the offset of an entry from the title pointer list, you have to look it up in the URL pointer list

Title index v0

Namespace: X

Path: listing/titleOrdered/v0

Mimetype : application/octet-stream+zimlisting

The content of the listing is the list of all entries in the zim archive (all namespace included, including X/listing/title/v0 itself).

Entries are sorted using the key <namespace><title>

Content size is 4 * <nbEntries>

This is the exact same content of the data titlePtrPos.

If present, titlePtrPos should directly point to the data of this entry.

Title index v1

Namespace: X

Path: listing/titleOrdered/v1

Mimetype : application/octet-stream+zimlisting

The content of the listing is the list of all "article entries" in the zim archive.

Those "article entries" may be redirect (what is a article entry is not really defined in the spec, it is up to the creator to define in which category a entry is).

Entries are sorted using the key <title> (All article entries are in C namespace by definition)

Content size is 4 * <nbArticle>

listing/titleOrdered/v1 may be used to pick random articles or to search article by title and be sure that no resource entries are included.