Difference between revisions of "2009-11-23 Report Developers Meeting 2009-2"

Jump to navigation Jump to search
no edit summary
(Created page with 'The second openZIM Developers Meeting took place November 20th to 22nd. == Participants == # Tommi Mäkitalo (tntnet) # Emmanuel Engelhart (Kiwix) # Tomasz Finc (Wikimedia Found…')
 
Line 13: Line 13:
== Topics ==
== Topics ==
=== better suitability for small devices ===
=== better suitability for small devices ===
Small devices are low on memory and don't have a powerful CPU. The OpenWrt team discovered a few problems when working with openZIM to display Wikipedia content:
;HTML parsing overhead
Using a full blown HTML parser uses up a lot of ressources. Available HTML engines are much more powerful than needed on these small devices. But as content is stored in HTML format using one of the available HTML engines is a logical way to go.
In fact on such small devices only very few markup is really needed: Headlines, bold, italic and anchors/links.
The idea of OpenWrt was to use a special markup for the content which is stable (HTML was considered being unstable as the standard changes once in a while) and much more reduced.
After a long discussion we came up with the solution to stick with HTML (to give all features of Wikipedia to users on full blown computers) but to use a special parser that ignores everything fancy in the markup and only renders the most neccessary things. That way we would still have some overhead in the ZIM file for small devices due to unused (ignored) HTML code, but it would be no difference in efficiency.
;Memory Footprint / Caches
As articles are clustered and stored in bigger compressed chunks, these clusters may not become to big, otherwise the memory available on small devices would be exhausted. The cluster size is currently by default 1 MB - this is the optimal size as compression algorithms themselve use blocks of 1 MB to compress data.
To reduce the memory footprint the streaming-mode of compression libraries offers a nice solution to only read these parts of a cluster that were needed. In streaming mode reading starts from the beginning of a compressed data stream and all data will be omitted until the pointer index in the uncompressed data stream is reached where the requested content starts.


=== more flexible MIME type list ===
=== more flexible MIME type list ===
In prior versions of ZIM the MIME type is specified by an integer, the list of available MIME types is hard-coded in the zimlib.
To be more flexible in future the hard-coded list will be replaced by a list of zero-terminated strings inside the ZIM data file. Therefore a mimeListPos is added to the ZIM header to specify the position of this MIME type list inside the ZIM file.


=== addressing articles; title vs. URL ===
=== addressing articles; title vs. URL ===
Line 52: Line 70:


The team decided to keep Manuel as project lead with the order to keep on giving talks, careing for marketing and maintaining contacts between openZIM and other projects.
The team decided to keep Manuel as project lead with the order to keep on giving talks, careing for marketing and maintaining contacts between openZIM and other projects.
[[Category:Press_Releases]]

Navigation menu