Difference between revisions of "Metadata"

From openZIM
Jump to navigation Jump to search
(9 intermediate revisions by 2 users not shown)
Line 4: Line 4:


== Keys ==
== Keys ==
{|{{Prettytable}}
{| class="sortable" style="border-width:1px; border-style:solid; border-color:#888888; background-color:#eeeeee; border-collapse:collapse; empty-cells:show" cellspacing="0" cellpadding="4" {{Prettytable}}
! Key    !! Mandatory    !! Description !! Example
! Key    !! Mandatory    !! Description !! Example
|-
|-
! Name
! Name
| yes
| yes
| A human readable identifier for the resource. It's the same across versions (should be stable across time). MUST be prefixed by the packager name.
| A human readable identifier for the resource. It's the same across versions (should be stable across time).
| ''kiwix.wikipedia_en.nopics''
| ''wikipedia_fr_football''
|-
|-
! Title
! Title
Line 37: Line 37:
| ''All articles (without images) from the english Wikipedia''
| ''All articles (without images) from the english Wikipedia''
|-
|-
! Long description
! LongDescription
| no
| no
| description of content (small paragraph)
| description of content (small paragraph)
Line 54: Line 54:
! Tags
! Tags
| no
| no
| A list of tags
| A list of [[tags]]
| ''nopic;wikipedia''
| ''wikipedia;_category:wikipedia;_pictures:no;_videos:no;_details:yes;_ftindex:yes''
|-
|-
! Relation
! Relation
Line 61: Line 61:
| URI of external related ressources
| URI of external related ressources
|  
|  
|-
! Flavour
| no
| A human readable string describing the way how the content has been scraped. It's the same across versions (should be stable across time).
| ''nopic''
|-
|-
! Source
! Source
| no
| no
| URI of the original source
| URI of the original source
| ''http://en.wikipedia.org/''
| ''https://en.wikipedia.org/''
|-
|-
! Counter
! Counter
Line 71: Line 76:
| Number of non-redirect entries per mime-type
| Number of non-redirect entries per mime-type
| image/jpeg=5;image/gif=3;image/png=2;...
| image/jpeg=5;image/gif=3;image/png=2;...
|-
! Scraper
| no
| Details about the software used to scrape the content, with its version
| mwoffliner 1.2.3
|}
|}


Line 78: Line 88:


You can also provide ''n'' optional /-/favicon_[heigt]x[width] entries for high resolution version of the favicon.
You can also provide ''n'' optional /-/favicon_[heigt]x[width] entries for high resolution version of the favicon.
== Illustration ==
A picture illustrating the content and should be located at /-/illustration
You can provide ''n'' optional /-/illustration.*_[heigt]x[width] entries


== See also ==
== See also ==
* [http://dublincore.org/documents/dces/ Dublin Core]
* [http://dublincore.org/documents/dces/ Dublin Core]

Revision as of 15:10, 5 December 2019

In order to provide a description to each ZIM file that can be easily extracted we defined a special namespace M and a standardized set of keywords that should be used.

Every key is defined like an article, the key name is used as the article name, the key value is put into the article text. This way also metadata is compressed, but extendable. Further keys could be used in a ZIM file without breaking the standard but please be aware that maybe the openZIM project will define additional keys in the future. Any ZIM library reading this metadata should ignore missing keys / values and just return NULL values in such cases.

Keys

Key Mandatory Description Example
Name yes A human readable identifier for the resource. It's the same across versions (should be stable across time). wikipedia_fr_football
Title yes title of zim file English Wikipedia
Creator yes creator(s) of the ZIM file content English speaking Wikipedia contributors
Publisher yes creator of the ZIM file itself Wikipedia user Foobar
Date yes create date (ISO - YYYY-MM-DD) 2009-11-21
Description yes description of content (one short sentence) All articles (without images) from the english Wikipedia
LongDescription no description of content (small paragraph) This ZIM file contains all articles (without images) from the english Wikipedia by 2009-11-10. The topics are ...
Language yes ISO639-3 language identifier (if many, comma separated) eng
License No License code of the content. CC-BY
Tags no A list of tags wikipedia;_category:wikipedia;_pictures:no;_videos:no;_details:yes;_ftindex:yes
Relation no URI of external related ressources
Flavour no A human readable string describing the way how the content has been scraped. It's the same across versions (should be stable across time). nopic
Source no URI of the original source https://en.wikipedia.org/
Counter no Number of non-redirect entries per mime-type image/jpeg=5;image/gif=3;image/png=2;...
Scraper no Details about the software used to scrape the content, with its version mwoffliner 1.2.3

Favicon

A favicon (48x48) is also mandatory and should be located at /-/favicon.

You can also provide n optional /-/favicon_[heigt]x[width] entries for high resolution version of the favicon.

Illustration

A picture illustrating the content and should be located at /-/illustration

You can provide n optional /-/illustration.*_[heigt]x[width] entries

See also