LinuxTag 2010

From openZIM
Jump to navigation Jump to search

June 9th - 12th

Participants

  1. Tommi
  2. Manuel
  3. Annette

Travelling

Talk

  • Title: openZIM - Wikipedia Offline - Current State
  • Language: English prefered, German possible
  • Category: Applications
  • License: Creative Commons License

Abstract

for program committee, approx. 1 page

Since 2008 we are working on a free implementation of a toolchain for offline Wikipedia, providing a container format, reader and writer softare. The first attempt was based on the Zeno file format used by Directmedia, the publisher of the Wikipedia DVD. The Wikipedia DVD 2008/2009 contained a free implementation developed by the people who form today the openZIM project and was the start of the openZIM project. Starting 2009 openZIM defines itself as "a free and open implementation of the ZIM file format", while openZIM provides both a free documentation of the file format as well as the tools to create ZIM files from web content and a reader to present them in a browser. Inspired by the Zeno format ZIM means "Zeno Improved".

While the open documentation of the ZIM file format enables everyone to write software using it, openZIM also provides tools to create ZIM files from HTML and a reader application that allows to browse the content in a standard internet browser along with some advanced tools. These are all based on zimlib, a library written in C++ that can be easily used in other applications to make them ZIM-aware. Since our start we released two versions of the ZIM format, the first version was spread in the public during LinuxTag 2009 when Wikimedia CH sponsored a batch of 500 editions of the german Wikipedia on DVD as free give-aways. This gave us a lot of valuable feedback.

During our first year we have been able to attract partners such as the Wikimedia Foundation that is working on a regular ZIM export on MediaWiki, the Israelian OLPC project that wants to provide hebrew Wikipedia on XO computers or a manufacturer of embedded devices which presents Wikipedia Offline on a device with only 8 MB of RAM. Most of these people gathered to a Developers Meeting and we are very happy that we were able to cover all of the needs that occured to us, even though some of them seemed to be mutually exclusive at the first glance.

With this talk we introduce openZIM for those who haven't heard about it and give a short overview over the history of Offline Wikipedia in general to provide a common basis and an understanding of the issues when dealing with huge data chunks such as Wikipedia content. The main part of the talk is focused on these issues and how openZIM has addressed these. A part of this will involve the ZIM internal structures, tools and the zimlib library. The goal is that attendants are able to understand how ZIM works, where implementational details can be found in case the file format should be implemented in any other programming language as well as enabling to work with the zimlib which can be used in any C++ application to make use of the ZIM format directly.

Short Abstract

for visitors / schedule, max. 450 characters

openZIM provides a free and open ZIM file format and tools - developed for and used to provide offline access to Wikipedia content.

We will give a introduction in openZIM and the ZIM format and then focus on technical issues when dealing with amounts of data like Wikipedia and how we have addressed them. After the talk you know the tools that are available, have got a overview on the file format and the resources to find more details if you plan to implement it by yourself and you know how to start using zimlib in your own application.

Links

  • URL: http://openzim.org/
  • Description: website of the openZIM developer team with documentation, SVN, bugtracker and ZIM file archive