Difference between revisions of "Metadata"

Jump to navigation Jump to search
Add explanations and recommendations on graphemes
(Add explanations and recommendations on graphemes)
Line 101: Line 101:
!
!
|}
|}
== Graphemes ==
When counting length of strings (e.g. for title, description, ...) we want to count the number of visual characters (since ) and not the number of Unicode characters needed to render this visual character. Some languages / characters need multiple Unicode characters. One example is <code>में</code> which has only 1 grapheme but uses 3 Unicode characters (e.g. in Python, <code>len("में") == 3</code>).
You can have a look at [https://en.wikipedia.org/wiki/Grapheme wikipedia article on grapheme]
Recommendation to count graphemes is:
* '''Node.JS:''' use the [https://www.npmjs.com/package/split-by-grapheme split-by-grapheme] package (in use in [https://github.com/openzim/mwoffliner/ mwoffliner scraper] for instance)
* '''Python:''' use the [https://pypi.org/project/regex/ regex] package and <code>len(regex.findall(r"\X", value))</code> (in use in [https://github.com/openzim/python-scraperlib/ python-scraperlib] for instance)


== Favicon (Old zim file) ==
== Favicon (Old zim file) ==
47

edits

Navigation menu