50
edits
(Finish missing phrase in graphemes explanation) |
|||
(3 intermediate revisions by 2 users not shown) | |||
Line 14: | Line 14: | ||
! Title | ! Title | ||
| yes | | yes | ||
| title of zim file. 30 [ | | title of zim file. 30 [[Metadata#Graphemes|graphemes]] maximum recommended. | ||
| ''English Wikipedia'' | | ''English Wikipedia'' | ||
|- | |- | ||
Line 34: | Line 34: | ||
! Description | ! Description | ||
| yes | | yes | ||
| description of content (one short sentence). 80 [ | | description of content (one short sentence). 80 [[Metadata#Graphemes|graphemes]] maximum recommended. | ||
| ''All articles (without images) from the english Wikipedia'' | | ''All articles (without images) from the english Wikipedia'' | ||
|- | |- | ||
! LongDescription | ! LongDescription | ||
| no | | no | ||
| extended description of content. Carriage return allowed. {{formatnum:4000}} [ | | extended description of content. It should not copy the ''Description'' or be shorter than it. Carriage return allowed. {{formatnum:4000}} [[Metadata#Graphemes|graphemes]] maximum recommended. | ||
| ''This ZIM file contains all articles (without images) from the english Wikipedia by 2009-11-10. The topics are ...'' | | ''This ZIM file contains all articles (without images) from the english Wikipedia by 2009-11-10. The topics are ...'' | ||
|- | |- | ||
Line 101: | Line 101: | ||
! | ! | ||
|} | |} | ||
== Graphemes == | |||
When counting length of strings (e.g. for title, description, ...) we want to count the number of visual characters (since this is the reason we limit the length of these metadata, we want to ensure they do not break reader UIs by taking way too much visual space) and not the number of Unicode characters needed to store/render this visual character. Some languages / characters need multiple Unicode characters. One example is <code>में</code> which has only 1 grapheme but uses 3 Unicode characters (e.g. in Python, <code>len("में") == 3</code>). | |||
You can have a look at [https://en.wikipedia.org/wiki/Grapheme wikipedia article on grapheme] | |||
Recommendation to count graphemes is: | |||
* '''Node.JS:''' use the [https://www.npmjs.com/package/split-by-grapheme split-by-grapheme] package (in use in [https://github.com/openzim/mwoffliner/ mwoffliner scraper] for instance) | |||
* '''Python:''' use the [https://pypi.org/project/regex/ regex] package and <code>len(regex.findall(r"\X", value))</code> (in use in [https://github.com/openzim/python-scraperlib/ python-scraperlib] for instance) | |||
== Favicon (Old zim file) == | == Favicon (Old zim file) == |
edits