Difference between revisions of "Build your ZIM file"

Jump to navigation Jump to search
3,291 bytes added ,  09:47, 30 October 2017
→‎zimmer: links updated
(→‎zimmer: links updated)
(17 intermediate revisions by 5 users not shown)
Line 26: Line 26:
** Description (only a few words)
** Description (only a few words)
** 48x48 PNG logo
** 48x48 PNG logo
=== Create a ZIM file from existing HTML contents ===
See http://www.openzim.org/wiki/Zimwriterfs_instructions for an overview and read the section below on zimwriterfs for some additional context.


== Developers ==
== Developers ==
Line 34: Line 37:
MWoffliner is a tool which allows to "dump" a Wikimedia project (Wikipedia, Wiktionary, ...) to a local storage. It should also work for any Mediawiki instance having parsoid installed. It goes through all articles (or a selection if specified) of the project and write HTML/pictures to your local filesystem.
MWoffliner is a tool which allows to "dump" a Wikimedia project (Wikipedia, Wiktionary, ...) to a local storage. It should also work for any Mediawiki instance having parsoid installed. It goes through all articles (or a selection if specified) of the project and write HTML/pictures to your local filesystem.


A virtual machine with MWoffliner installed is provided [http://download.kiwix.org/dev/ZIMmaker.ova here]. You might have to update the source code to get the last improvments.
A virtual machine with MWoffliner is provided [http://download.kiwix.org/dev/ZIMmaker.ova here]. You might have to update the source code to get the last improvements.


More information are available [https://sourceforge.net/p/kiwix/other/ci/master/tree/mwoffliner/ here].
More information are available [https://github.com/kiwix/mwoffliner here].


=== zimwriterfs ===
=== zimwriterfs ===
zimwriterfs is a console tool to create ZIM files from a localy stored directory containing a "self-sufficient" HTML content (with pictures, javascript, stylesheets). The result will contain all the files of the local directory compressed and merged in the ZIM file. Nothing more, nothing less. For now, zimwriterfs only works on POSIX compatible systems. You simply need to compile it and run it. The software does not need a lot of resources, but if you create a pretty big ZIM files, then it could take a while to complete. [https://git.wikimedia.org/tree/openzim/HEAD/zimwriterfs Go to zimwriterfs source code repository].
zimwriterfs is a console tool to create ZIM files from a localy stored directory containing a "self-sufficient" HTML content (with pictures, javascript, stylesheets). The result will contain all the files of the local directory compressed and merged in the ZIM file. Nothing more, nothing less. For now, zimwriterfs only works on POSIX compatible systems. You simply need to compile it and run it. The software does not need a lot of resources, but if you create a pretty big ZIM files, then it could take a while to complete.  
Instructions on how to prepare and use zimwriterfs are here [[zimwriterfs_instructions]]
[https://github.com/wikimedia/openzim/tree/master/zimwriterfs Go to zimwriterfs source code repository].
 
A virtual machine with zimwriterfs is provided [http://download.kiwix.org/dev/ZIMmaker.ova here].


=== Zimbalaka ===
=== Zimbalaka ===
Zimbalaka, a web hosted tool which allows to create #Wikipedia ZIM files based on articles selections. More details in this [http://www.arunmozhi.in/blog/zimbalaka-an-openzim-creator/ blog post]. [https://github.com/tecoholic/Zimbalaka Here is the source code].
The following descirption is based on the notes published by the original author of Zimbalaka, as they're no longer available on the site they were published on. An archived copy is available on archive.org https://web.archive.org/web/20150531004251/http://www.arunmozhi.in:80/blog/zimbalaka-an-openzim-creator/#content
 
Zimbalaka, is designed as a web hosted tool which enables #Wikipedia ZIM files to be created based on articles selections.
 
It accepts two types of inputs: a list of pages or a Wikipedia category. Then Zimbalaka downloads those pages, removes all the clutter such as: sidebars, toolbox, edit links, etc., and provides a cleaned version as a ZIM file for download. It can be opened in Kiwix, etc.
 
The ZIM is created with a simple welcome page with all the pages as a list of links.
 
Zimbalaka has multilingual and multi-site support. That is, you can create a ZIM file from pages of any language of the 280+ existing Wikipedias, and also from sites like WikiBooks, Wiktionary, Wikiversity and such. You can even input any custom url like (<nowiki>http://sub.domain.com/</nowiki>), Zimblaka would add (/wiki/Page_title) to it and download the pages.
 
==== Pain points ====
A small pain point is that, Zimbalaka also strips the external references that occur at the end of the Wikipedia articles, as the original author didn’t find these useful content intended to be used in an offline environment.
 
You cannot add a custom Welcome page in the zim file. Not a very big priority. The current file does its work of listing all the pages.
 
You cannot include pages from multiple sites as a single zim file. The workaround is to create multiple files or use a tool called zimwriterfs, which has to be compiled from source (this is used by zimbalaka behind the scenes).
 
==== Developers ====
This tool is written using Flask – A simple Python web framework for the backend, Bootstrap as the frontend and uses the zimwriterfs compiled binary as the workhorse. The zimming tasks are run by Celery, which has been automated by supervisord. All the co-ordination and message passing happens via Redis.
 
[https://github.com/tecoholic/Zimbalaka Here is the source code].


=== zimwriterdb ===
=== zimwriterdb ===
Line 49: Line 76:
=== Wiki2html ===
=== Wiki2html ===
[[Wiki2html]] can be used to prepare static HTML files from a running Mediawiki instance.
[[Wiki2html]] can be used to prepare static HTML files from a running Mediawiki instance.
===[https://github.com/vss-devel/zimmer zimmer]===
This package is primarily a tool for creating a ZIM dump from a Mediawiki-based wiki.
The package consists of 2 scripts:
* '''''wikizimmer.js''''' -- dumps the wiki's articles (name space 0) into a collection of static HTML files.
* '''''zimmer.js''''' -- builds a ZIM file from a static HTML files collection. Historically, zimmer.js is mostly a drop-in replacement for zimwriterfs with a notable exception: it doesn't support withFullTextIndex option (index format is not documented).
The major point is that wikizimmer.js unlikely to mwoffliner doesn't depend on the Parsoid and Redis and zimmer.js unlikely to zimwriterfs doesn't depend on the zimlib.
The package is relatively easy to install and it can even process some wikis running rather old versions of the Mediawiki engine.
There is also zimmer's counterpart -- [https://github.com/vss-devel/unzimmer unzimmer]. It unpacks a ZIM file into a directory, which could be useful for some debugging.


== See also ==
== See also ==
14

edits

Navigation menu