50
edits
(Copy page from Github wiki to openZIM wiki) |
|||
(5 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
<blockquote>This page was originally located at https://github.com/openzim/overview/wiki/ZIMs-Naming-Convention </blockquote> | <blockquote>This page was originally located at https://github.com/openzim/overview/wiki/ZIMs-Naming-Convention </blockquote>This page explains the naming convention use both for the ZIM `Name` metadata and the ZIM filename, for ZIMs published by openZIM. | ||
This is an openZIM convention, i.e. other publishers are free to follow the same convention or develop their own. | |||
=== Context === | === Context === | ||
Line 21: | Line 22: | ||
=== ZIM <code>Name</code> Metadata === | === ZIM <code>Name</code> Metadata === | ||
Format: '''<code>{ | Format: '''<code>{domain}_{lang}_{selection}</code>''' | ||
The <code>_</code> character is reserved as separator between the parts. | The <code>_</code> character is reserved as separator between the parts. | ||
The parts must only contain alphanums or <code>-</code> or <code>.</code> characters. | The parts must be all lowercase and only contain alphanums (<code>a-z</code>, no accentuated or special characters) or <code>-</code> or <code>.</code> characters (regex is <code>[a-z0-9\-\.]</code>). | ||
{| class="wikitable" | {| class="wikitable" | ||
|+Components of ZIM <code>Name</code> Metadata | |+Components of ZIM <code>Name</code> Metadata | ||
Line 34: | Line 34: | ||
!Example | !Example | ||
|- | |- | ||
|<code> | |<code>domain</code> | ||
|Domain name (or project) <sup>1</sup> | |Domain name (or project) <sup>1</sup> | ||
|<code>android.stackexchange.com</code>, <code>wikipedia</code> | |<code>android.stackexchange.com</code>, <code>wikipedia</code> | ||
|- | |- | ||
|<code>lang</code> | |<code>lang</code> | ||
|ISO-639 | |ISO-639 language code or <code>mul</code> <sup>2</sup> | ||
|<code>en</code>, <code>fr</code>, <code>zh</code>, <code>mul</code | |<code>en</code>, <code>fr</code>, <code>zh</code>, <code>mul</code> | ||
|- | |- | ||
|<code>selection</code> | |<code>selection</code> | ||
Line 47: | Line 47: | ||
|} | |} | ||
* <sup>1</sup> | * <sup>1</sup> By default, use the web domain name associated with the content (including for Youtube channels, ...). Project names are exceptions (basically valid only if we at least have a dedicated category for this project); use domain names if unsure, or best, ask on Slack. Should domain name could contains illegal characters for our convention, it will be encoded with Punycode, e.g. https://www.punycoder.com/) | ||
* | *2 Whenever possible, prefer to use the ISO-639-1 (2 chars) language code. When the ISO-639-1 code does not exists or is ambiguous (leading to conflict of ZIM Name between two different ZIMs), using the ISO-639-3 is recommended. When multiple languages are present inside the ZIM, <code>mul</code> is to be used. Note that the ZIM <code>Language</code> metadata lists all the languages (ISO-639-3) instead of using <code>mul</code> | ||
=== ZIM filename === | === ZIM filename === | ||
Line 55: | Line 55: | ||
The <code>_</code> character is reserved as separator between the parts. | The <code>_</code> character is reserved as separator between the parts. | ||
The parts must only contain alphanums or <code>-</code> or <code>.</code> characters. | The parts must be all lowercase and only contain alphanums (<code>a-z</code>, no accentuated or special characters) or <code>-</code> or <code>.</code> characters (regex is <code>[a-z0-9\-\.]</code>). | ||
{| class="wikitable" | {| class="wikitable" | ||
|+Components of ZIM filename | |+Components of ZIM filename | ||
Line 79: | Line 78: | ||
* <sup>1</sup> It doesn't need to be the equal to the `Name` metadata but requirements identical. | * <sup>1</sup> It doesn't need to be the equal to the `Name` metadata but requirements identical. | ||
=== Zimfarm === | === Implementation on the Zimfarm === | ||
Depending on the scraper, setting the <code>Name</code> metadata in the Zimfarm can be mandatory (follow above instructions) or optional. When optional, the scraper usually properly sets it according to the convention. Should it not, open a ticket on the scraper repo and set it manually in the recipe until it is fixed. | Depending on the scraper, setting the <code>Name</code> metadata in the Zimfarm can be mandatory (follow above instructions) or optional. When optional, the scraper usually properly sets it according to the convention. Should it not, open a ticket on the scraper repo and set it manually in the recipe until it is fixed. | ||
Line 85: | Line 84: | ||
'''Important''': when setting filename manually, you are responsible for the whole filename, including the period part. Most scraper allow inserting a special `{period}` string that will be replaced with the year-date one. Ex: <code>supersite.com_en_all_{period}.zim</code>. | '''Important''': when setting filename manually, you are responsible for the whole filename, including the period part. Most scraper allow inserting a special `{period}` string that will be replaced with the year-date one. Ex: <code>supersite.com_en_all_{period}.zim</code>. | ||
=== See also === | |||
[[Metadata]] |
edits