38
edits
(Copy page from Github wiki to openZIM wiki) |
(better explain the lang + use domain instead of project in ZIM Name format) |
||
Line 1: | Line 1: | ||
<blockquote>This page was originally located at https://github.com/openzim/overview/wiki/ZIMs-Naming-Convention </blockquote> | <blockquote>This page was originally located at https://github.com/openzim/overview/wiki/ZIMs-Naming-Convention </blockquote>This page explains the naming convention use both for the ZIM `Name` metadata and the ZIM filename, for ZIMs published by openZIM. | ||
This is an openZIM convention, i.e. other publishers are free to follow the same convention or develop their own. | |||
=== Context === | === Context === | ||
Line 21: | Line 22: | ||
=== ZIM <code>Name</code> Metadata === | === ZIM <code>Name</code> Metadata === | ||
Format: '''<code>{ | Format: '''<code>{domain}_{lang}_{selection}</code>''' | ||
The <code>_</code> character is reserved as separator between the parts. | The <code>_</code> character is reserved as separator between the parts. | ||
Line 34: | Line 35: | ||
!Example | !Example | ||
|- | |- | ||
|<code> | |<code>domain</code> | ||
|Domain name (or project) <sup>1</sup> | |Domain name (or project) <sup>1</sup> | ||
|<code>android.stackexchange.com</code>, <code>wikipedia</code> | |<code>android.stackexchange.com</code>, <code>wikipedia</code> | ||
|- | |- | ||
|<code>lang</code> | |<code>lang</code> | ||
|ISO-639 | |ISO-639 language code or <code>mul</code> <sup>2</sup> | ||
|<code>en</code>, <code>fr</code>, <code>zh</code>, <code>mul</code | |<code>en</code>, <code>fr</code>, <code>zh</code>, <code>mul</code> | ||
|- | |- | ||
|<code>selection</code> | |<code>selection</code> | ||
Line 47: | Line 48: | ||
|} | |} | ||
* <sup>1</sup> | * <sup>1</sup> By default, use the web domain name associated with the content (including for Youtube channels, ...). Project names are exceptions (basically valid only if we at least have a dedicated category for this project); use domain names if unsure, or best, ask on Slack. Should domain name could contains illegal characters for our convention, it will be encoded with Punycode, e.g. https://www.punycoder.com/) | ||
* | *2 Whenever possible, prefer to use the ISO-639-1 (2 chars) language code. When the ISO-639-1 code does not exists or is ambiguous (leading to conflict of ZIM Name between two different ZIMs), using the ISO-639-3 is recommended. When multiple languages are present inside the ZIM, <code>mul</code> is to be used. Note that the ZIM <code>Language</code> metadata lists all the languages (ISO-639-3) instead of using <code>mul</code> | ||
=== ZIM filename === | === ZIM filename === | ||
Line 79: | Line 80: | ||
* <sup>1</sup> It doesn't need to be the equal to the `Name` metadata but requirements identical. | * <sup>1</sup> It doesn't need to be the equal to the `Name` metadata but requirements identical. | ||
=== Zimfarm === | === Implementation on the Zimfarm === | ||
Depending on the scraper, setting the <code>Name</code> metadata in the Zimfarm can be mandatory (follow above instructions) or optional. When optional, the scraper usually properly sets it according to the convention. Should it not, open a ticket on the scraper repo and set it manually in the recipe until it is fixed. | Depending on the scraper, setting the <code>Name</code> metadata in the Zimfarm can be mandatory (follow above instructions) or optional. When optional, the scraper usually properly sets it according to the convention. Should it not, open a ticket on the scraper repo and set it manually in the recipe until it is fixed. | ||
edits