47
edits
(→Keys) |
(Add explanations and recommendations on graphemes) |
||
Line 101: | Line 101: | ||
! | ! | ||
|} | |} | ||
== Graphemes == | |||
When counting length of strings (e.g. for title, description, ...) we want to count the number of visual characters (since ) and not the number of Unicode characters needed to render this visual character. Some languages / characters need multiple Unicode characters. One example is <code>में</code> which has only 1 grapheme but uses 3 Unicode characters (e.g. in Python, <code>len("में") == 3</code>). | |||
You can have a look at [https://en.wikipedia.org/wiki/Grapheme wikipedia article on grapheme] | |||
Recommendation to count graphemes is: | |||
* '''Node.JS:''' use the [https://www.npmjs.com/package/split-by-grapheme split-by-grapheme] package (in use in [https://github.com/openzim/mwoffliner/ mwoffliner scraper] for instance) | |||
* '''Python:''' use the [https://pypi.org/project/regex/ regex] package and <code>len(regex.findall(r"\X", value))</code> (in use in [https://github.com/openzim/python-scraperlib/ python-scraperlib] for instance) | |||
== Favicon (Old zim file) == | == Favicon (Old zim file) == |
edits