Difference between revisions of "2009-11-23 Report Developers Meeting 2009-2"

Jump to navigation Jump to search
 
(3 intermediate revisions by the same user not shown)
Line 35: Line 35:


=== addressing articles; title vs. URL ===
=== addressing articles; title vs. URL ===
We had a discussion about the fact that maybe there is a case where the article names (=titles) are not represented by the URL of an article. Kiwix is currently doing that for reason and changing all URLs to some kind of a short hash key.
The former idea was to just use the URL as the article identifier and add another field in the directory entry to define the title of a given article. But as we are relying on a working poor-mans-search on small device that do not se fulltext search but do a binary seach on the article index, we decided to add another index. So each article will refenced twice, in the titlePtrList (formerly indexPtrList) and the new urlPtrList. Each list contains the same entries, but once ordered by title and once by URL.
This means that the ZIM file header gets another field urlPtrPos to reference the start of the urlPtrList.


=== global metadata ===
=== global metadata ===
Line 44: Line 49:


=== article metadata ===
=== article metadata ===
Similar to global meta data for individual articles can be included. Devices and special readers will only use the actual article content which is already stored in the A namespace. If needed the ZIM creator can add individual meta data as kind of a template for each article in the B namespace, under the same name as the article. When the reader application initializes the ZIM interface in the zimlib it can set if it wants to retrieve the pure article content from namespace A or the processed content of namespace B including the article content. This way maximum flexibility for the reader application is kept.


=== fulltext search ===
=== fulltext search ===


=== integer encoding ===
=== integer encoding ===
Currently the integer compression from the [[Zeno File Format]], QUnicode has been used in ZIM as well. For the new indeces we want to look into alternatives that obey a standard, the UTF-8 compression could be a good choice.
The details have to be set during development as well as the places where to use it. In general all integer compression in ZIM will use the same method to be consistent.


=== lzma compression ===
=== lzma compression ===
A new compression method using LZMA algorithm will be introduced. This is long-planned but there was not library for a long time. Now there are ''lzma-utils'' and ''xz-utils'' available, still in development but we want to give it a try now.
We expect a much better compression ratio and a faster decompression which would improve the usability on small devices as LZMA is focusing on easy uncompression.


=== future planning ===
=== future planning ===

Navigation menu