Content team/ZIM Metadata Convention
While ZIM Naming Convention focuses on the Name metadata and the associated filename, which is an important matter for publication / CMS, it is also important to document our other conventions regarding other ZIM metadata for ZIMs published by openZIM.
This is document is an openZIM convention, i.e. other publishers are free to follow the same convention or develop their own.
This documentation is obviously a WiP as we gain experience on odd cases.
ZIM Language
The specification says that Language
metadata must be one or a list of iso639-3 codes.
While this is sufficiently precise and strict for a specification, it does not details what to do when we face an ambiguous situation.
The ambiguous situation will typically arise when an iso639-3 code is split in two codes. Sometimes (rarely) we know the content is only on one of the new codes and situation is simple. For all other cases, we consider it is safe to assume that both new codes are probably used inside the content and use both codes.
This happened for instance for Emiliano-Romagnolo which was split from eml
(now deprecated in ISO) to egl
and rgn
. In such a case, we should use both value in Language
metadata: egl,rgn
. Order can simply be alphabetical. This is deemed reasonable and useful to allow discovery of these ZIMs. This can be change once we've got confirmation from a native speaker that only one of the two codes is in fact present in the content.
See discussion at https://github.com/openzim/overview/issues/51#issuecomment-2904587084